- » Applying linked data initiatives to local Māori development
- » Data handling
- » Interactive R graphics
- » Predictive model building
- » Visualisations of concepts in probability modelling and statistics
- » Scoring Orienteering Performance
- » Statistics Education research
- » Graphical tools for exploring the concept of genetic drift
- » Maps, graphs, and data analysis for community conservation projects
- » Creating intergeneration socio-economic data using the Integrated Data Infrastructure
- » Beat the leak: A new device designed for pelvic floor muscle training
- » Towards better measures of income in social surveys: An empirical investigation of measurement error and missing data using the Integrated Data Infrastructure
Applying linked data initiatives to local Māori development
Project code: SCI227
This project involves examining how recent development in linked official statistical data can be used to inform Māori development at a regional level. The New Zealand government is investing in the rapid development of an integrated data infrastructure with the intention that it will be used to inform the government’s fifty billion dollar per annum social investment program. As Māori development occurs at a regional rather than national level, Māori organisations are interested in how this world leading data initiative can be used to inform Māori development at a local level. This project will involve working with Māori organisations as well as Statistics NZ data resources.
Students should have a strong statistics and/or computer science background. Some knowledge of te reo Māori and/or Māori development would be useful.
Project code: SCI228
This project will involve working with the iNZight development team on data handling including input/output, importing metadata, reformatting, pre-processing, working with dates and times, and plotting and analysis including date and time variables. This is a good project for building data-science skills.
Very good R-programming skills.
Interactive R graphics
Project code: SCI229
This project will involve working with the iNZight development team on developing methods for taking graphics generated in R and then interacting with those graphs via a web browser. This is a good project for building data-science skills should suit students with interests in computing and statistics.
Predictive model building
Project code: SCI230
This project will involve working with the iNZight development team on developing methods for automatically building predictive regression models and model diagnostics for generalised linear models.
Very good grades in Stats 310 and 330 and very good R-programming skills.
Visualisations of concepts in probability modelling and statistics
Project code: SCI231
This project will involve working with the department’s Statistics Education Research group on visualisations to improve understanding of concepts and models in probability and statistical inference.
Scoring Orienteering Performance
Project code: SCI232
The three Auckland orienteering clubs run a series of races to determine the orienteer of the year in a number of age grades for males and females. Points are assigned for each race with the winner getting 10 points, the second placed runner 9.5 points etc., the first 20 runners receiving points. However a number of orienteers run out of their age grade, taking on more difficult courses. This causes a problem in assigning points to these runners. In this project a student will try to determine a fair system of awarding points when runners, run up a grade or grades in this manner.
If time permits there is an additional problem to investigate, which is how to calculate points to determine the leading school in secondary school orienteering competitions in New Zealand.
Students undertaking this project need not have an understanding of orienteering. They should be competent in using R and in data analysis. Knowledge of Bayesian analysis and Bayesian analysis software may also be helpful.
Statistics Education research
The student would be required to classify assessment data and undertake exploratory analyses.
This summer project involves exploring the following questions:
- What are the characteristics of a good multiple-choice question?
- What effect have changes to the content taught in STATS10x had on the nature of the questions asked?
- Can we predict student performance on a particular question, based on its characteristics?
Graphical tools for exploring the concept of genetic drift
Project code: SCI234
Maps, graphs, and data analysis for community conservation projects
Project code: SCI235
This project will investigate creative charts and maps for data from large-scale animal trapping programmes in New Zealand, and help community volunteers with their data processing. Ideally, the student will contribute directly to programming for our new CatchIT software, helping to produce simple, attractive graphics and statistical analyses that will appeal to Mums-and-Dads conservation volunteers throughout the country.
Programming is a key component of this project, which is suitable for students who enjoy writing code and are highly competent in R and/or some other computer language. An interest in ecological applications is also useful.
Creating intergeneration socio-economic data using the Integrated Data Infrastructure
Project code: SCI242
The Integrated Data Infrastructure (IDI) is a collection of de-identified administrative datasets (e.g., on health events, justice contacts, education enrolments, tax paid) that have been linked at the person-level for the whole New Zealand population, and made available for use by researchers under strict conditions which protect individuals’ privacy and confidentiality. The datasets linked cover different timeframes, most going back only as far as the 1980s or 1990s. However, the Department of Internal Affairs data includes birth information dating from the 1840s, with details about the child (e.g., gender, birth weight, place of birth) and their parents (e.g., age, occupation). Intriguingly, both the child and their parents are given IDs allowing for parents born in New Zealand to be linked back to their own birth records and to details of their parents, and (potentially) to the parents of their parents, etc.
This project aims to make use of these intergenerational links to answer the following questions: (i) how many generations can be determined, and what is the total number at each generation?; (ii) can occupational socioeconomic status be coded back through the generations?; and, if so (ii), how has intergeneration socioeconomic mobility changed across New Zealand’s history? Other questions that might be explored, time permitting, are: (iv) do socio-economic influences on mortality (or other health outcomes) extend back generations?; and, if so, (v) how much of the ethnic differential in mortality (or other health outcomes) can be explained by intergeneration socio-economic differences?
The student working on this project will need to have excellent SAS skills, be willing to learn STATA and occupation coding, and be prepared to work under Statistics New Zealand’s strict privacy and confidentiality conditions.
Beat the leak: A new device designed for pelvic floor muscle training
We are developing a smart novel device, designed to assist women with their pelvic floor muscle exercises. The FemFit is an intra-vaginal pressure sensor which is ‘wearable’ and capable of measuring pressures from the abdomen and the pelvic floor during exercise and activities of daily living. The FemFit consists of an array of eight pressure sensors which transmit pressure signals via Bluetooth to an android device. Each sensor samples pressure at ~100Hz. We are at the point of testing the repeatability and reliability of the FemFit in a population of healthy women.
A student is required to undertake analysis of the data from the sensors, with a particular emphasis on the repeatability of the measurements.
This is a combined project with the Auckland Bioengineering Institute and would be of interest to a student who is capable of dealing with large data sets, and with a genuine interest in clinical research. This is a great opportunity to experience being involved at an early stage in research, and to contribute to the development of the FemFit. A knowledge of physiology to help interpret the in-vivo data would be helpful, as would experience in Matlab.
Towards better measures of income in social surveys: An empirical investigation of measurement error and missing data using the Integrated Data Infrastructure
This project involves working with the New Zealand Census and the Integrated Data Infrastructure. The Integrated Data Infrastructure is a collection of de-identified administrative data sets that have been linked at the person level for the whole of the New Zealand population, and made available for use by researchers under strict conditions which protect individuals ‘ privacy and confidentiality. The accuracy of income estimates from social surveys is compromised by low response rates to questions about income, and response error in the sources of income and the amount of income reported.
This project involves making comparisons between income recorded in the Inland Revenue tax data, and self-reported income recorded in the census. The project involves highlighting discrepancies between self-reported income and income measured on tax records, testing estimates from multiple imputation models in the census against observed income in the Inland Revenue tax data, and creating bias adjustments to align the self-reported income in the Census to income reported on tax records.
The result of this project with be 1) estimates for discrepancies between self-reported and collected data on income, with the magnitude and direction of discrepancies investigated across social groups. 2) Best practice for imputing missing self-reported income based on tests of different imputation models and methodologies. 3) Creation of bias adjustments weights that align reports of income in the census with official records from the Inland Revenue. Time permitting this will develop into a paper to be submitted to an academic journal with the student named as an author.
The student working on this project will need to have excellent data analysis skills, be willing to learn STATA, and be prepared to work under Statistics New Zealand’s strict privacy and confidentiality conditions.