Statistics

Multinomial Logit Model Extensions

Supervisor

Dr Thomas Yee

Faculty of Science

Project code: SCI001

The aim is to improve the multinomial() family function in the VGAM R package to handle sparse data. The multinomial logit model is the standard model for regressing a nominal categorical response against a set of explanatory variables but it can suffer from numerical problems with sparse data. However, bias reduction can be a solution for this (Ding, B. and Gentleman, R. (2005), Journal of Computational and Graphical Statistics 14(2): 280--298). The main task is to implement this within the function, hence it will handle complete separation, quasi-complete separation and overlap. Also, we could implement a score test, as well as the Hausman-McFadden test for independence of irrelevant alternatives (IIA). Time permitting, another useful feature would be to handle the nested multinomial logit model, however this would be quite a challenge. This work would suit somebody with a solid understanding of generalized linear models and good R programming skills. The background to this work is Yee, T. W. (2015), Vector Generalized Linear and Additive Models: With an Implementation in R. Springer: New York, USA. Chapter 18 is particularly relevant.

Multicountry comparison of costs and financing of routine immunisation

Supervisor

Dr Claudia Rivera-Rodriguez

Faculty of Science

Project code: SCI002

In many low-income regions, the costs of immunization are not tracked. Examples are of these countries are Benin Ghana Uganda Zambia Moldova and Honduras. Some recent studies such as the EPIC studies have sought to estimate program costs on the basis of detailed information collected on a sub-sample of facilities. Estimates have been obtained via accurate measurement and appropriate regression analyses. However, uncertainty of the estimate of total costs has not been quantified of this. This can be done using a design based approach with standard errors and/or 95% confidence intervals.   

In this project we aim to use available software to calculate variance for estimators resulting from complex surveys. We will also provide an overview of statistical uncertainty in the context of complex sampling designs. We aim to compute measures of uncertainty, either via appropriately derived formulae or through resampling techniques such as bootstrap.   

This project requires experience using R, the survey package and complex sampling survey.

Effects of EFL on success in introductory statistics

Supervisor

Dr Claudia Rivera-Rodriguez

Faculty of Science

Project code: SCI003

Equity in education is one of the main goals of the University of Auckland. Students from a non-English speaking background, however, face many challenges that may affect this equity goal. They need to function in English to succeed as their English speakers peers do. Even though a certain level of English is required to be accepted to the University, there are many difficulties faced with language such as speed and accents, among others. For example, a recurrent event with students whose first language is not English is that they do not attend the lectures because they do not manage to follow the content. This can be mainly due to language barriers. It is not only that students do not attend lectures, but they do not attend tutorials and other forms of help as much as they should. One of our interests is evaluating the relationship between a student's first language and their final grades. In particular, are students with English as a second language more likely to give up on papers and get a DNS grade?

The project will also take into account other demographic characteristics such as ethnicity and gender. Specifically, we want to use records from students in enrolled in a range of statistics courses: STATS 101/108. It is of interest to compare how any effects of English as a second language differ between these papers. The project will focus on  first semester 2017 results.    

This project requires  experience using R, and regression analyses.