Statistics

Bayesian methods for representing dietary patterns

Supervisor

Dr Beatrix Jones

Faculty of Science

Project code: SCI014

Conventionally, diet is measured by food frequency questionnaires or food diaries, producing a high dimensional response. Rotated principal components are then used to summarize variability across the population. The loadings of these components are thresholded and the remaining foods interpreted, producing “dietary patterns.”

This project will explore a more principled way of selecting important foods through sparse Bayesian factor analysis. This framework also provides a method of looking at the association between dietary patterns and health outcomes such as BMI or fasting glucose levels.

Implementation of Bayesian spectral density estimation algorithms

Supervisor

Assoc. Prof. Renate Meyer and Dr Patricio Maturana Russel

Faculty of Science

Project code: SCI015

Advanced LIGO (Laser Interferometric Gravitational Wave Observatory) made the very first direct measurement of gravitational waves in September 2015.

MCMC techniques were used for posterior computation of the signal parameters of the binary inspiralling black hole system. The current time series models used for parameter estimation of gravitational wave signals assume that the time-varying dimensionless strain at the detector is decomposed into a signal plus additive noise, assumed to be Gaussian, stationary with known power spectral density. However, the power spectral density in practice is estimated beforehand from a separate stretch of signal-free data. 

This project aims to develop Python code based on an existing R package for Bayesian spectral density estimation with the ultimate aim to integrate this into the LIGO software library and estimate signal and noise simultaneously.

A good knowledge of R and Python as well as good programming skills are essential and interest in Bayesian inference, MCMC techniques and time series is a bonus. This project would be suitable for data scientists, computer scientists, and statistics students with good computing skills.

Implementations of Count Distributions in VGAM

Supervisor

Dr Thomas Yee

Faculty of Science

Project code: SCI016

The VGAM R package, which is an implementation of Fisher scoring and iteratively reweighted least squares, fits many univariate and multivariate distributions to provide the maximum likelihood estimate and variance-covariance matrix as usual output. This work will focus on count distributions, e.g., the Katz family, Poisson--Tweedie, discrete Weibull and Linnik distributions. For these the expected information matrix needs to be derived and implemented robustly in software. Care is needed for the choice of initial values and the use of stable parameterizations. Other distributional properties, such as the cumulative distribution function and its inverse, and random number generation, also need to be derived and implemented where possible. Finite mixtures of some relatively simple distributions can be explored. Time permitting, distributions involving special functions, such as the hypergeometric and iterated exponential, could be investigated; some of these would be quite challenging.

This work would suit somebody with a solid background in mathematical statistics, familiarity with generalized linear models, and R programming skills. The background to this work is Yee, T. W. (2015), Vector Generalized Linear and Additive Models: With an Implementation in R. Springer: New York, USA, especially chapters 11 and 17.