Statistics

Applications are now closed

What are good design principles for statistics learning interactives?


Supervisor

Anne-Marie Fergusson

Discipline

Statistics

Project code: SCI227

Data-driven interactives have the potential to impact positively on the learning of statistical ideas. However, further research is needed to establish what design principles should be followed when creating interactives specifically to teach statistical ideas. This research project will explore the design of data-driven interactives from the dual perspectives of user and learner, and will include piloting the interactives with a small group of people. As this research project will involve creating new statistics learning interactives, the student would need to have experience coding interactive webpages (using javascript/jquery/d3 or similar) and an interest in statistics education.

Identifying variable responses to a stressor


Supervisor

Judi Hewitt

Discipline

Statistics

Project code: SCI228

This project will investigate whether the response of species abundances to sediment mud content is constant over time, or is affected by increasing delivery of mud to site. Analytical techniques used will be analysis of covariance, generalised linear modelling, and some simple time series models. It will use data from 7 Auckland east coast estuaries.

Identifying environmental degradation


Supervisor

Judi Hewitt

Discipline

Statistics

Project code: SCI229

This project will use methods that identify breaks (sudden changes from pre-existing patterns) in time series to investigate potential changes over the last 20 years in biodiversity of an Auckland harbour. 

An understanding of the requirements for autoregressive time series analysis is required.

Spatial modelling of abundances


Supervisor

Judi Hewitt

Discipline

Statistics

Project code: SCI230

This project will investigate spatial modelling techniques to map abundances of selected species across intertidal areas.  It will include analysis of general patch size using Moran’s I coefficient, and kriging based purely on space and associated with environmental factors. It will have a particular focus on dealing with the presence of barriers to the presence of intertidal species (e.g., deep channels and land).

GIS skills will be required.

Can records of lost pet birds predict the establishment of new populations in the wild?


Supervisor

Rachel Fewster

Margaret Stanley (SBS)

Discipline

Statistics

Project code: SCI231

We have a created a database of pet birds reported to be lost by scanning ‘Trademe’ and other websites every few days for two years. Our question is whether these records of lost pets successfully predict the establishment of new exotic populations in the wild, an effect known as “propagule pressure” (the number of individuals released and the number of release events). 

This studentship will involve cleaning and analysis of the lost-bird dataset to investigate whether these records can usefully predict the establishment of potentially harmful exotic species. Outcomes will contribute to the National Pest Pet Biosecurity Accord. Students should be capable R users and interested in biological applications of statistics.

Survey statistics in a database


Supervisor

Thomas Lumley

Discipline

Statistics

Project code: SCI232

Multistage surveys can give rise to moderately large data sets (tens of millions of rows).  Most current software for survey analysis reads the data into memory, but most of the computations can actually be expressed as database operations.  In this project you would work on implementing and testing some survey computations using the `dplyr’ R package as a database interface.

Prerequisites: good knowledge of R, and either `dplyr’ or SQL

Species distribution maps from spatial capture-recapture models


Supervisor

Ben Stevenson

Discipline

Statistics

Project code: SCI233

Spatial capture-recapture (SCR) models are capable of estimating animal density across a landscape. There are a few ways to produce species distribution maps from these: some have sound theoretical justification, but require the user to make various assumptions of the data; others claim to free the user of some of these assumptions, but whether or not they are appropriate is an open question.

This project will investigate and compare the different methods used to create species distribution maps from fitted SCR models.

This project requires some experience using R, and an interest in programming. It would be beneficial, but it is not necessary, to have a little familiarity with Bayesian methods and the BUGS language.

A comparison of spatial capture-recapture and random encounter models for camera trap data


Supervisor

Ben Stevenson

Discipline

Statistics

Project code: SCI234

Camera-trap surveys are commonly used to estimate density of wildlife populations. Over the last decade, spatial capture-recapture (SCR) and random encounter models (REMs) have gained traction in their application to the resulting data. They each require slightly different information---for example, SCR usually needs individuals to be recognised when they are detected, while REMs usually require a priori knowledge of average animal speeds. The two methods also make different assumptions about the way animals move and behave. This project will aim to assess and compare the performance of SCR and REM estimators.

This project requires some experience using R, and an interest in programming.

Are correlates of loneliness similar across the life-course


Supervisor

Barry Milne

Roy-Lay Yee

Discipline

Statistics

Project code: SCI235

This proposal aims to investigate the risk factors for loneliness across the life-course, using data from a cross-age survey, the International Social Survey Programme (ISSP) survey for 2017.

 

The ISSP 2017 survey is on ‘Social Networks’, and contains items assessing loneliness, as well as data on a number of socio-demographic and attitudinal risk factors. The student will explore whether risk factors for loneliness vary across age, and will compare results against data from two longitudinal studies: the Dunedin longitudinal study (a birth cohort followed to age 45), and the LILACS study of aging (following participants from ages 80).

The student working on this project will need to know how to use SPSS and how to undertake regression analyses.

Testing the healthy immigrant hypothesis: obesity in 4 year olds


Supervisor

Nichola Shackleton

Barry Milne

Discipline

Statistics

Project code: SCI236

This project will use information from the Integrated Data Infrastructure to investigate links between immigration, and child obesity. The healthy immigrant hypothesis states that those who migrate are healthier than those they leave behind.

We aim to investigate the healthy immigrant hypothesis, by comparing child obesity rates by immigration status (generations/length of time in NZ). Specifically we will investigate:

  1. If immigrant children have lower obesity rates, than non-immigrant children. How this varies by length of time in New Zealand and country of origin.
  2. If children of immigrants born in New Zealand have lower obesity rates, than children of non-immigrants, and how this varies by length of time the parents have been in New Zealand and country of origin.
  3. If the known deprivation gradients and ethnic differences in child obesity differ by immigration status, and whether this varies by length of time in New Zealand.

The student working on this project will need to have excellent SAS and/or STATA skills, and be prepared to work under Statistics New Zealand’s strict privacy and confidentiality conditions.

The t-SNE algorithm in R and Python


Supervisor

James Curran

Discipline

Statistics

Project code: SCI237

The t-Distributed Stochastic Neighbour Embedding (t-SNE) algorithm is a dimensionality reduction technique for multivariate data. An implementation of the t-SNE algorithm in R has shown differing output and slower computation in comparison to another implementation in Python (see https://goo.gl/zS8XvJ). This project aims to investigate this discrepancy: is it due to bug, or something else? Can a better implementation in R outperform Python? A possible end goal of this project is to submit a polished R package to CRAN.

This project requires moderate C++ skills, R skills and an understanding of Python. Knowledge of Java would also be beneficial.

Accessible graphics for data on maps


Supervisor

Chris Wild

Discipline

Statistics

Project code: SCI238

This project will investigate ways in which people display data about what is happening in different geographical regions, comparing time periods and the results for different variables, displaying several variables at the same time, and work towards developing software tools that enable best practice displays with a minimum of user input. This will also involve automatically finding and using relevant shape files from the web. This is a good project for building data-science skills that should suit students with interests in computing and statistics.

Requirements: Good skills in R programming.

Interactive graphics with R


Supervisor

Chris Wild

Discipline

Statistics

Project code: SCI239

This project will involve researching and implementing interactive web graphs for R-generated plots including calling back to R from the webpage for new information and updating plots without having complete redrawing. This is a good project for building data-science skills that should suit students with interests in computing and statistics.

Requirements: Good skills in R and javascript programming.

Data wrangling tools in Shiny and Gtk


Supervisor

Chris Wild

Discipline

Statistics

Project code: SCI240

This project will involve investigating and drawing lessons from the available interactive data wangling systems and the R tidyverse tools to scope and begin developing a Shiny app that not only makes data wrangling easy for users but also writes the R code it using to: provide an audit trail, and aid reproducibility and learning data wrangling in R. This is a good project for building data-science skills that should suit students with interests in computing and statistics.

Requirements: Very good R-programming skills.

Date and time-stamped data


Supervisor

Chris Wild

Discipline

Statistics

Project code: SCI241

The subject of this project working with time-stamped data, and developing wrangling, plotting and analysis capabilities for such data. This is a good project for building data-science skills.

Requirements: Very good R-programming skills.

Predictive model building


Supervisor

Chris Wild

Discipline

Statistics

Project code: SCI242

This project will involve working with the iNZight development team on developing methods for automatically building predictive regression models and model diagnostics for generalised linear models and survival models.

Requirements: Very good grades in Stats 310 and 330 and very good R-programming skills.

Visualisations of concepts in probability modelling and statistics


Supervisor

Chris Wild

Discipline

Statistics

Project code: SCI243

This project will involve working with the department’s Statistics Education Research group on visualisations to improve understanding of concepts and models in probability and statistical inference.

Requirement: Good javascript programming skills, or proven record in learning new computer systems from online resources (outside of formal courses)

Visualisation of Multiple Response data from complex surveys


Supervisor

Chris Wild

Discipline

Statistics

Project code: SCI244

iNZight already has tools for displaying and analysing multiple-response data from random samples. This project will involve generalising these methods so that they work with data obtained from more general survey sampling designs.

Requirements: Very good grades in Stats 310 and 330 and very good R-programming skills.

Topics in data visualisation


Supervisor

Chris Wild

Discipline

Statistics

Project code: SCI245

A student may wish to explore other topics such as providing capabilities with colour including using colour metrics to choose colour palettes with maximally-different colours, developing Shiny apps for allowing sensitive data to be analysed and displayed with the data itself kept secure, or developing tools for survival analysis.

Requirements: Very good R-programming skills.