Statistics

Estimating frog density from acoustic surveys: A comparison of spatial capture-recapture methods

Supervisor

Ben Stevenson

Discipline

Statistics

Project code: SCI208

Some animal species are difficult to see or catch, but easy to hear. Acoustic surveys are a cost-effective way to assess these populations. Analysing the resulting data using a spatial capture-recapture (SCR) model provides an estimate of animal population density (see the following link for a video example: https://youtu.be/JTYFYtZJXro). This project will compare the performance of three exisiting SCR methods, both by simulation, and by application to data from an acoustic survey of South Africa’s Cape Peninsula moss frog.

Requirements: Good R programming skills.
 

Text analytics

Supervisor

Chris Wild

Discipline

Statistics

Project code: SCI209

This project will involve researching and building easy-to-use tools for obtaining and analysing data that comes in the form of text whether it be data scraped from twitter feeds, blog posts and other social media, collections of emails, fan fiction, reports and other documents, or even novels. Some of the core things people are interested in with data like this are what are these people talking about, how do they feel about the issues under discussion, and how are these sentiments changing. This is a good project for learning to analyse text data and building data-science skills leveraging extensive capabilities already in R that should suit students with interests in computing and statistics. It is also an opportunity to learn valuable skills from the experienced members of the iNZight team.
Requirements: Good skills in R programming.
 

Analytics for date and time-stamped data

Supervisor

Chris Wild

Discipline

Statistics

Project code: SCI210

The subject of this project working with data with date and time fields that tell us when things happened. The project involves developing wrangling, plotting and analysis capabilities for such data. Can we sensibly automate how times and dates are handled by plots in conjunction with other variables in many useful situations? This is a good project for learning to work with time-stamped data and building data-science skills that should suit students with interests in computing and statistics. It is also an opportunity to learn valuable skills from the experienced members of the iNZight team.
Requirements: Good skills in R programming.
 

Predictive analytics

Supervisor

Chris Wild

Discipline

Statistics

Project code: SCI211

This project will involve working with the iNZight development team on developing capabilities for building predictive models using automated ensemble tools such as TensorFlow and Caret, and simpler more understandable tools such as regression models and classification & regression trees. The project will also involve model diagnostics for generalised linear models and survival models. This is a good project for building data-science skills that should suit students with interests in computing and statistics. It is also an opportunity to learn valuable skills from the experienced members of the iNZight team.
Requirements: Very good grades in Stats 310 and 330 and good R-programming skills.
 

Interactive graphics with R

Supervisor

Chris Wild

Discipline

Statistics

Project code: SCI212

This project will involve researching and implementing interactive web graphs for R-generated plots including calling back to R from the webpage for new information and updating plots without having complete redrawing, including 3-D plotting. This is also a good project for building data-science skills that should suit students with interests in computing and statistics. It is also an opportunity to learn valuable skills from the experienced members of the iNZight team.
Requirements: Good skills in R and javascript programming.
 

Data wrangling tools

Supervisor

Chris Wild

Discipline

Statistics

Project code: SCI213

This project will involve investigating and drawing lessons from the available interactive data wangling systems and the R tidyverse tools to scope and develop a Shiny app that not only makes data wrangling easy for users but also writes the R code it using to: provide an audit trail, and aid reproducibility and learning data wrangling in R. This is a good project for building data-science skills that should suit students with interests in computing and statistics. It is also an opportunity to learn valuable skills from the experienced members of the iNZight team.
Requirements: Very good R-programming skills.
 

Visualisation of Multiple Response data from complex surveys

Supervisor

Chris Wild

Discipline

Statistics

Project code: SCI214

iNZight already has tools for displaying and analysing multiple-response data from random samples. This project will involve generalising these methods so that they work with data obtained from more general survey sampling designs.
Requirements: Very good grades in Stats 310 and 340 and very good R-programming skills.
 

Topics in data visualisation

Supervisor

Chris Wild

Discipline

Statistics

Project code: SCI215

A student may wish to explore other data science topics with a view to later incorporation in iNZight such as: analytics for images; developing tools for survival and longitudinal data analysis; tools for webscraping; providing capabilities with colour including using colour metrics to choose colour palettes with maximally-different colours; developing Shiny apps for allowing sensitive data to be analysed and displayed with the data itself kept secure.
Requirements: Very good R-programming skills.
 

Identifying variable responses to a stressor

Supervisor

Judi Hewitt

Discipline

Statistics

Project code: SCI216

This project will investigate whether the response of species abundances to sediment mud content is constant over time, or is affected by increasing delivery of mud to site. Analytical techniques used will be analysis of covariance, generalised linear modelling, and some simple time series models. It will use data from 7 Auckland east coast estuaries.

Modelling the scale of spatial interactions between key sandflat species

Supervisor

Judi Hewitt

Discipline

Statistics

Project code: SCI217

Inter-species and species-habitat interactions are usually modelled as if they occur at a fixed scale, however, the strength of any interaction is likely to be a function of the scale at which different species experience their environment and also the scale at which observations are made. This study will assess the interactions observed between key sandflat species in different sized windows of observation by exploring the effect of varying lag and extent on simple correlations and analysing cross-correlograms.
Ability to use r to extract different sized windows will be needed
 

Individual and Collective Wellbeing for Māori

Supervisor

Andrew Sporle

Discipline

Statistics

Project code: SCI218

This project will examine the association between individual and collective measures of wellbeing using a nationally representative official survey of Māori adults (Te Kupenga). Analysis will be undertaken using the Statistics NZ datalab facility.
Prerequisites: Familiarity with Māori wellbeing concepts and experience in using R for statistical analysis/modelling.
 

What are good principles for designing data visualisation interactives for large-scale learning?

Supervisor

Anna Fergusson

Discipline

Statistics

Project code: SCI219

Interactives that utilise the power of data visualisation have the potential to impact positively on learning, particularly when used with online learners and within large lectures of hundreds of students. However, further research is needed to establish what design principles should be followed when designing interactive data visualisations for learning. This research project will explore the design of data visualisation interactives from the dual perspectives of user and learner, and will include piloting the interactives with a small group of people. As this research project will involve creating new learning interactives, the student would need to have experience/skills with coding interactive web pages/applications. An interest in data science and/or statistics education would be advantageous.

Characterising changes in dietary patterns over time

Supervisor

Beatrix Jones
Clare Wall
John Thompson

Discipline

Statistics

Project code: SCI220

Food frequency questionnaires provide a rich multivariate picture of each respondent’s diet. Tracking this picture longitudinally is a challenging problem—long term differences across time points are influenced by age related changes, food fads, and potentially changes in the questionnaire used. Simultaneously with characterising these changes, we hope to understand what patterns at early time points set the stage for continued healthy eating. This project will explore multivariate methods for understanding dietary patterns through time. In particular we will investigate the use of OnPLS, a technique for characterising shared and unique variability between sets of multivariate measurements. This will be compared to existing approaches.
Students should be confident R users; completion of STATS302 or similar would be an advantage.
 

Random graph dynamics and hitting times

Supervisor

Jessie Goodman

Discipline

Statistics

Project code: SCI221

Produce a random graph by connecting pairs of vertices uniformly at random. Then run a random walk on this random graph: at each step, move to a uniformly chosen neighbour of the current position. The hitting time is the number of steps needed to reach a particular target vertex, and it varies in a particular way depending on the size of the random graph.
This project looks at the effect of changing the random graph. Between each random walk step, "rewire" some edges: pick a fraction of edges, disconnect the vertices on either side, then randomly reconnect those vertices. Do these graph dynamics make it faster (or slower) to reach the target vertex? How many edges need to be rewired to change the typical hitting time?
The student working on this project should have some familiarity with stochastic processes, probability and graphs.