Statistics

Generally-Altered, -Inflated and -Truncated (GAIT) regression

Supervisor

Dr Thomas Yee
Faculty of Science
Project code: SCI014

This project would suit a strong student in mathematical statistics. Zero-altered (hurdle), -inflated and -truncated (positive) count distributions are now well-established, especially in Poisson and binomial distribution forms. Recently I have proposed generally-altered, -inflated and -truncated (GAIT) distributions whereby any finite set of support values are 'special' (rather than just 0). Two variants of the GAIT models have been proposed: a nonparametric method based on the multinomial logit model and a parametric method based on finite mixtures of the parent distribution on differing support. Work has been done developing the above for count distributions, e.g., the R package VGAM now has several new functions that can be applied to heaped data.

This project involves extending the results to other common distributions, for example, continuous distributions such as the normal and gamma which allow alteration, inflation and truncation. We need to derive the basic properties and quantities such as the expected information matrices under a wide range of realistic scenarios. It is expected that this work will result in additional software useful for data scientists. Time allowing, we will also investigate Cormack-Jolly-Seber regression. Having LaTeX and R skills would be helpful for conducting this work.

Dirichlet Diffusion Trees as realistic priors for complex astronomical object

Supervisor

Dr Brendon Brewer
Faculty of Science
Project code: SCI015

Many objects astronomers observe, such as galaxies and nebulae, are complex in structure. However, in order to describe their shapes, scientists often adopt simplistic models, such as elliptical profiles which can be described by a simple equation with a handful of parameters. In some applications these overly simple models are likely to lead to erroneous conclusions about other aspects of the problem when the shape of the object is a nuisance parameter. In this project I would like to adopt Radford Neal’s Dirichlet Diffusion Tree (DDT) prior, originally suggested for density estimation in statistics, and (a) implement it in C++ instead of R, (b) apply it to astronomical data, to make analyses more realistic. I will need a student who is strong in Bayesian statistics and computer programming.

Predicting the risk of hip replacement for patients with arthritis in the presence of missing data

Supervisor

Dr Claudia Rivera-Rodriguez
Faculty of Science
Project code: SCI016

This project will analyse routinely collected from the National Minimum Dataset (NMDS) who have a hospital discharge with arthritis. For each of these patients, we will search the 10-year history of hospital discharges to identify those with hip replacements and potential risk factors.

The project will use log models with post stratification.

Requirements: R programming, modelling and sampling.

Continuous-time branching processes and random graphs

Supervisor

Dr Jesse Goodman
Faculty of Science
Project code: SCI017

Take n vertices and assign each of them a specified number of neighbours. Then connect pairs of vertices by edges, uniformly at random but subject to the requirement that each vertex retains its specified number of neighbours. The result is an example of a random graph.  

We can model distances within this random graph by assigning each a positive random edge weight, representing the length of an edge. Pick a starting vertex and look at the vertices close to it. This local neighbourhood is closely connected to a continuous-time branching process, where a birth at time t corresponds to a vertex connected by a path of total length t.  

This project will investigate distances in this random graph via the growth properties of such continuous-time branching processes, especially in the regime where vertices have a large number of neighbours.

Stochastic modelling of patient’s trajectories in a hospital

Supervisor

Dr Azam Asanjarani
Faculty of Science
Project code: SCI022

The aim of this project is constructing a stochastic model for prediction of individual’s progression through various stages of a disease. The results can also provide a sensible contribution to build a prediction model that predicts the risks of the expected trajectories of patients through intensive care units. We use statistical analysis and the method of phases for solving this problem.

Control of stochastic queueing networks

Supervisor

Dr Azam Asanjarani
Faculty of Science
Project code: SCI023

The main idea of this project is devising an appropriate and optimal model for a network of queues which fits practical applications in the fields of health care, energy, manufacturing, traffic and communication networks. We use matrix analytic methods for solving this problem.

The nominated student(s) should have a strong background in probability. Programming skill (R or Matlab) is an advantage.

Nomograms for the prediction of caesarean section in nulliparous women

Supervisor

Assoc. Prof. Alain C. Vandal
Faculty of Science
Project code: SCI024

There are risks associated with unplanned caesarean delivery (CD). Predicting the need for a CD is especially difficult in nulliparous women (women who have never had children). For this reason, midwives and obstetricians seek a screening test to predict the need for CD in such women. A nomogram – a graphical scoring aid – for this purpose has been devised using data on Irish births (Burke et al, AJOG 2017), but it is suspected that the risk differs by ethnicity and perhaps distinct factors in New Zealand.

Using data from 1183 first births at Auckland’s Middlemore Hospital, the project will consist in the production of two nomograms to estimate the risk of CD in nulliparous women; one will use ultrasound scan data, and one will not. The nomograms will be obtained from appropriately selected regression models and classifiers on the data.

Stochastic models for biodiversity

Supervisor

Assoc. Prof. Simon Harris
Faculty of Science
Project code: SCI025

Stochastic models have a significant role to play in understanding the biodiversity of species. Some neutral models for extinctions and speciations involve critical branching processes, and these can be well understood using probability theory, including determining the genealogical structure of reconstructed phylogenetic trees for those species currently alive. When species exhibit varying levels of fitness, more general typed branching processes can be used as (non-neutral) speciation models, although these are often far more challenging to analyse mathematically. However, real reconstructed phylogenetic trees often look surprisingly unbalanced, yet such observed patterns are not readily explained by natural, simple stochastic models. This project in probability will investigate this issue and other problems arising in the stochastic modelling of biodiversity.  

Inhomogeneous branching Brownian motions

Supervisor

Assoc. Prof. Simon Harris
Faculty of Science
Project code: SCI026

Brownian motion is a fundamental model of modern probability theory. It represents the random diffusion of a particle, and appears naturally as the universal scaling limit of random walks. Branching Brownian motions form an important class of stochastic population models in which each particle currently alive behaves independently, moving around in space as a diffusion whilst giving birth to offspring at random during its lifetime. Typical problems include determining survival probabilities, how quickly the population grows and colonises space, or the genealogies of samples of individuals. Branching Brownian motions are also intimately related to (non-linear) reaction-diffusion equations, these arising in many diverse situations from chemical and physical reactions, to population genetics, to geology, and elsewhere. This project aims to investigate some inhomogeneous branching Brownian motions using both classical and cutting edge probabilistic techniques.

Spatial modelling of sensor networks

Supervisor

Dr Charlotte Jones-Todd, Prof. David Williams
Faculty of Science
Project code: SCI029

Low-cost ozone sensors make monitoring air-quality more accessible: allowing members of the public to install their own devices and making it more cost effective to increase the spatial coverage of a sensor network. Yet, the reliability of the data is often questioned: without frequent sensor calibration can the measurements be trusted? In urban areas air-quality is highly variable, both in space and over time; in order to adequately model these fluctuations measurement error must first be accounted for.

This is a joint project between Statistics and Chemical Sciences. The goals of this project are (1) to develop statistical methodology for modelling low resolution ozone sensor networks, and (2) assess their reliability at different spatial and temporal scales. Thus, enabling identification of areas at-risk of high air pollution and allowing stakeholders to implement potential mitigation techniques.

This project will involve data cleaning, manipulation, and visualisation. The student should expect to be heavily involved in statistical model development and be comfortable using a statistical programming language, ideally R.