Applying linked data initiatives to local Māori development


Project code:  SCI227

Supervisor

Andrew Sporle

This project involves examining how recent development in linked official statistical data can be used to inform Māori development at a regional level. The New Zealand government is investing in the rapid development of an integrated data infrastructure with the intention that it will be used to inform the government’s fifty billion dollar per annum social investment program. As Māori development occurs at a regional rather than national level, Māori organisations are interested in how this world leading data initiative can be used to inform Māori development at a local level. This project will involve working with Māori organisations as well as Statistics NZ data resources.

Students should have a strong statistics and/or computer science background. Some knowledge of te reo Māori and/or Māori development would be useful.

Top

Data handling


Project code:  SCI228

Supervisor

Christopher Wild

This project will involve working with the iNZight development team on data handling including input/output, importing metadata, reformatting, pre-processing, working with dates and times, and plotting and analysis including date and time variables. This is a good project for building data-science skills. 

Requirements

Very good R-programming skills.

Top

Interactive R graphics


Project code:  SCI229

Supervisor

Christopher Wild

This project will involve working with the iNZight development team on developing methods for taking graphics generated in R and then interacting with those graphs via a web browser. This is a good project for building data-science skills should suit students with interests in computing and statistics.

Requirements

Good skills in R and javascript programming.

Top

Predictive model building


Project code:  SCI230

Supervisor

Christopher Wild

This project will involve working with the iNZight development team on developing methods for automatically building predictive regression models and model diagnostics for generalised linear models. 

Requirements

Very good grades in Stats 310 and 330 and very good R-programming skills.

Top

Visualisations of concepts in probability modelling and statistics


Project code:  SCI231

Supervisor

Christopher Wild

This project will involve working with the department’s Statistics Education Research group on visualisations to improve understanding of concepts and models in probability and statistical inference.  

Requirement

Very good javascript programming skills.

Top

Scoring Orienteering Performance


Project code:  SCI232

Supervisor

David Scott

The three Auckland orienteering clubs run a series of races to determine the orienteer of the year in a number of age grades for males and females. Points are assigned for each race with the winner getting 10 points, the second placed runner 9.5 points etc., the first 20 runners receiving points. However a number of orienteers run out of their age grade, taking on more difficult courses. This causes a problem in assigning points to these runners. In this project a student will try to determine a fair system of awarding points when runners, run up a grade or grades in this manner.

If time permits there is an additional problem to investigate, which is how to calculate points to determine the leading school in secondary school orienteering competitions in New Zealand.

Students undertaking this project need not have an understanding of orienteering. They should be competent in using R and in data analysis. Knowledge of Bayesian analysis and Bayesian analysis software may also be helpful.

Top

Statistics Education research


Project code:  SCI233

Skills Required

The student would be required to classify assessment data and undertake exploratory analyses.

This summer project involves exploring the following questions:

  • What are the characteristics of a good multiple-choice question?
  • What effect have changes to the content taught in STATS10x had on the nature of the questions asked?
  • Can we predict student performance on a particular question, based on its characteristics?
Top

Graphical tools for exploring the concept of genetic drift


Project code:  SCI234

Supervisor

James Curran

Skills Required

Either very good R skills (preferably with Shiny) or Javascript.

Genetic drift describes how the frequencies of genes change over time due to mating. If the mating populations are finite and there is no mutation or migration then eventually all individuals in the same population will have the same genes. This process is known as fixation. The level of inbreeding at any particular time can be measured by the “population substructure” coefficient. At 20 years ago I wrote a Windows based programme that demonstrated genetic drift and allowed students to visually explore the forces which affect it. Although this programme still works, it is unclear that it will continue to do so in the future. I would like someone to port the code (or probably rewrite it from scratch) so that it is accessible through a web browser. R + Shiny or Javascript would be good platforms to do this. This project is ideal for someone who really likes programming, graphics, and has some understanding of web technologies. You will learn some elementary population genetics, and associated statistics.

Top

Maps, graphs, and data analysis for community conservation projects


Project code:  SCI235

Supervisor

Rachel Fewster

This project will investigate creative charts and maps for data from large-scale animal trapping programmes in New Zealand, and help community volunteers with their data processing.  Ideally, the student will contribute directly to programming for our new CatchIT software, helping to produce simple, attractive graphics and statistical analyses that will appeal to Mums-and-Dads conservation volunteers throughout the country.

Programming is a key component of this project, which is suitable for students who enjoy writing code and are highly competent in R and/or some other computer language.  An interest in ecological applications is also useful.

Top

Creating intergeneration socio-economic data using the Integrated Data Infrastructure


Project code:  SCI242

Supervisors

  • Dr Barry Milne
  • Dr Nichola Shackleton
  • Dr Andrew Sporle
  • Prof Matthias Schonlau

The Integrated Data Infrastructure (IDI) is a collection of de-identified administrative datasets (e.g., on health events, justice contacts, education enrolments, tax paid) that have been linked at the person-level for the whole New Zealand population, and made available for use by researchers under strict conditions which protect individuals’ privacy and confidentiality. The datasets linked cover different timeframes, most going back only as far as the 1980s or 1990s. However, the Department of Internal Affairs data includes birth information dating from the 1840s, with details about the child (e.g., gender, birth weight, place of birth) and their parents (e.g., age, occupation). Intriguingly, both the child and their parents are given IDs allowing for parents born in New Zealand to be linked back to their own birth records and to details of their parents, and (potentially) to the parents of their parents, etc.

This project aims to make use of these intergenerational links to answer the following questions: (i) how many generations can be determined, and what is the total number at each generation?; (ii) can occupational socioeconomic status be coded back through the generations?; and, if so (ii), how has intergeneration socioeconomic mobility changed across New Zealand’s history?  Other questions that might be explored, time permitting, are: (iv) do socio-economic influences on mortality (or other health outcomes) extend back generations?; and, if so, (v) how much of the ethnic differential in mortality (or other health outcomes) can be explained by intergeneration socio-economic differences?

The student working on this project will need to have excellent SAS skills, be willing to learn STATA and occupation coding, and be prepared to work under Statistics New Zealand’s strict privacy and confidentiality conditions.

Top

Beat the leak: A new device designed for pelvic floor muscle training


Project code:  SCI244

Department

Statistics

We are developing a smart novel device, designed to assist women with their pelvic floor muscle exercises. The FemFit is an intra-vaginal pressure sensor which is ‘wearable’ and capable of measuring pressures from the abdomen and the pelvic floor during exercise and activities of daily living. The FemFit consists of an array of eight pressure sensors which transmit pressure signals via Bluetooth to an android device. Each sensor samples pressure at ~100Hz. We are at the point of testing the repeatability and reliability of the FemFit in a population of healthy women.

A student is required to undertake analysis of the data from the sensors, with a particular emphasis on the repeatability of the measurements.

This is a combined project with the Auckland Bioengineering Institute and would be of interest to a student who is capable of dealing with large data sets, and with a genuine interest in clinical research. This is a great opportunity to experience being involved at an early stage in research, and to contribute to the development of the FemFit. A knowledge of physiology to help interpret the in-vivo data would be helpful, as would experience in Matlab.

Top

Towards better measures of income in social surveys: An empirical investigation of measurement error and missing data using the Integrated Data Infrastructure


Project code:   SCI245

Department

Statistics

This project involves working with the New Zealand Census and the Integrated Data Infrastructure. The Integrated Data Infrastructure is a collection of de-identified administrative data sets that have been linked at the person level for the whole of the New Zealand population, and made available for use by researchers under strict conditions which protect individuals ‘ privacy and confidentiality. The accuracy of income estimates from social surveys is compromised by low response rates to questions about income, and response error in the sources of income and the amount of income reported.

 This project involves making comparisons between income recorded in the Inland Revenue tax data, and self-reported income recorded in the census. The project involves highlighting discrepancies between self-reported income and income measured on tax records, testing estimates from multiple imputation models in the census against observed income in the Inland Revenue tax data, and creating bias adjustments to align the self-reported income in the Census to income reported on tax records.

The result of this project with be 1) estimates for discrepancies between self-reported and collected data on income, with the magnitude and direction of discrepancies investigated across social groups. 2) Best practice for imputing missing self-reported income based on tests of different imputation models and methodologies. 3) Creation of bias adjustments weights that align reports of income in the census with official records from the Inland Revenue.  Time permitting this will develop into a paper to be submitted to an academic journal with the student named as an author.

The student working on this project will need to have excellent data analysis skills, be willing to learn STATA, and be prepared to work under Statistics New Zealand’s strict privacy and confidentiality conditions.

Top