Ihaka Lecture Series
In March 2017 the Department of Statistics launched an annual lecture series named after Associate Professor Ross Ihaka in honour of his contributions to the field. Find out about the 2022 lecture series below.
Ross Ihaka, along with Robert Gentleman, co-created R – a statistical programming language now used by the majority of the world’s practising statisticians. It is hard to over-emphasise the importance of Ross’s contribution to our field. We named this lecture series in his honour to recognise his work and contributions to our field in perpetuity.
Find out more about Ross Ihaka here
Building Building Blocks for Data Science
The field of Data Science is fortunate because the most popular software tools for Data Science are programming languages. The availability of such tools depends on people building effective, efficient, and open software tools for Data Science. This means that most Data Scientists learn to write code, and some Data Scientists are also developers; writing code so that other people can write code.
The 2022 Ihaka Lecture Series features three speakers who develop software tools for Data Science, building systems that can be built upon in turn.
RSVP here: https://ihaka_2022.eventbrite.co.nz
Lectures commence at 6.30pm
MLT2/303-102, Building 303
38 Princes Street
NB: It is important to note that the presentation will take place via Zoom, with the speaker at a remote location. However, the presentation will be streamed live in the lecture theatre, and there will be the usual opportunity for Q&A with the speaker at the end of the session.
Refreshments will be available before each lecture at 6pm
Lecture 1: Thursday 28 July 2022
The genesis of experimentation
Dr Emi Tanaka, Senior Lecturer in Statistics, Monash University
Experiments are essential endeavours to understand the process or phenomena around us via the analysis of experimental data. However, as a precursor to any analysis, the importance of the design of experiment and the data collection process cannot be emphasised enough.
There is no salvation for rubbish data, yet there is far more focus on the analysis of experimental data than any steps prior to the analysis. In this talk, I introduce the framework, called “the grammar of experimental designs”, implemented as the edibble R-package that puts the focus on capturing the user’s intention and understanding of the experimental structure to plan, design and simulate experiments. This approach differs considerably from standard, often recipe-driven, approaches and has potential to encourage users to reflect and revise designs tailored to their experimental need.
Dr. Emi Tanaka is a lecturer in statistics at Monash University whose primary interest is to develop impactful statistical methods and tools that can readily be used by practitioners. Her research area includes data visualisation, mixed models and experimental designs, motivated primarily by problems in bioinformatics and agricultural sciences. She is currently the President of the Statistical Society of Australia Victorian Branch and the recipient of the Distinguished Presenter’s Award from the Statistical Society of Australia for her delivery of a wide-range of R workshops.
You can see the slides from this lecture here, and watch the lecture on YouTube here.
Lecture 2: Thursday 4 August 2022
New plumbing: Adding a pipe operator to base R
Professor Luke Tierney, Ralph E. Wareham Professor of Mathematical Sciences, the University of Iowa
The forward pipe operator ‘%>%’ was introduced to R by the ‘magrittr’ package and has since become an integral part of many data science workflows. Forward pipe operators have also been introduced in other languages in recent years. R 4.1.0 added a new forward pipe operator ‘|>’ to the base R language. This talk will review the history of forward pipe operators in R and other languages, and explain the motivation and design decisions behind the new operator.
Luke Tierney is Ralph E. Wareham Professor of Mathematical Sciences at the University of Iowa. He has been a member of the R Core Team since 1998. His research has focused mainly on two aspects of computational methods and tools to support statistical analysis. The first area involves developing computational methods, based on approximations and simulation methods, for carrying out Bayesian data analysis. The second involves designing, developing, and maintaining computing environments for statistics and data science.
Looking on the bright side
We should be worried about how much of our personal data businesses are gathering, but are there benefits to be had from allowing our health system to know more about us? We are on constant guard to protect our computers from viruses, but when a virus strikes humanity, can our computers help to protect us? We know that giving teenagers the ability to communicate 24/7 can have negative outcomes, but what happens when scientists get hold of social media tools?
The 2021 Ihaka Lecture Series featured three speakers who described how modern computing can be used to positively impact the world.
The recordings of each lecture are available to view below.
Lecture 1: Thursday 29 July 2021
Data Science in the Connected Era
Dr Simon Urbanek Senior Lecturer, Department of Statistics, University of Auckland
Our world is increasingly interconnected, which has several implications. On the one hand it increases the amount and variety of data we can collect to make informed decisions and improve our lives, but also it allows us to perform data analyses without constraints related to the physical location of the data or compute infrastructure.
Modern computer technologies such as cloud computing and the Web have given rise to social media, but in this talk we will explore the possibilities of leveraging them for visualisation and data analysis, connecting people with data across the world and fostering collaboration.
We will illustrate the benefits of that approach using RCloud - a collaborative tool for data analysis and interactive visualisation which supports several data analytic languages, distributed computing, discovery, sharing and reproducible research. It allows us to analyse data collaboratively at a large scale and communicate results efficiently.
Professor Simon Urbanek is a Senior Lecturer in the Department of Statistics at the University of Auckland. Simon obtained his PhD in Statistics from the Augsburg University, Germany in 2004 and has worked at AT&T Labs in Data Science and AI Research for 15 years, leading research and projects on large-scale data analysis in the areas of mobility networks, TV and advertising.
His main interests are visualisation, interactive graphics, big data analytics, statistical and distributed computing. He is member of the R Core Development Team and author of numerous popular R packages including Rserve, multicore, rJava, iPlots, RJDBC and iotools.
Lecture 2: Thursday 5 August 2021
Implementing a Machine-Learning Tool to Support High-Stakes Decisions in Child Welfare: A case study in Human Centred AI
Professor Rhema Vaithianathan, Centre for Social Data Analytics, AUT
Data analytics techniques like predictive risk modelling offer incredible opportunities to learn from rich data sets and make decisions supported by data. But while the private sector has been quick to realise the benefits of data analytics (especially as a tool to drive profitability), the public sector has moved much slower, despite needing new solutions to many wicked social problems.
Professor Rhema Vaithianathan will reflect on what we can learn about applying data analytics in a trusted way, from the very different experiences of the private and public sectors. In particular, she will talk about different approaches to key concepts like consent, transparency, fairness and community voice and how they can contribute to project success or failure. She will go on to talk about new ‘rules of engagement’ that are emerging for social good uses of data analytics, drawing on her experiences implementing the Allegheny Family Screening Tool, a machine learning tool used to support screening of child abuse calls in Allegheny County, PA (United States) since 2016, and scaling out of this work in California and Colorado.
Professor Vaithianathan is a Professor of Economics at Auckland University of Technology where she is director of the Centre for Social Data Analytics, a research centre focused on using data analytics for social impact. She is also a Professor of Social Data Analytics at the Institute for Social Science Research at The University of Queensland, where she leads a second node of the Centre
for Social Data Analytics.
Lecture 3: Thursday 12 August 2021
Modelling to support the COVID-19 response in Aotearoa New Zealand
Dr Rachelle Binny, Manaaki Whenua-Landcare Research and Te Pūnaha Matatini
Mathematical models are playing an important role in the ongoing pandemic, providing insights into the spread of the virus and the effects of interventions to help inform response strategies. This seminar will give an overview of mathematical modelling by Te Pūnaha Matatini to support New Zealand’s COVID-19 response. We will describe the models used to simulate spread of COVID-19 in New Zealand, how they can help inform decisions on switching between Alert Levels, and how we are modelling the risk of new cases arriving at the border.
Rachelle Binny is a mathematical biology researcher at Manaaki Whenua - Landcare Research in Christchurch NZ, and a Principle Investigator in Te Pūnaha Matatini, the NZ Centre of Research Excellence for Complex Systems and Networks. Her research lies at the interface of mathematics, statistics and biology and is data-driven. Following a BSc in Mathematical Biology (University of Dundee, Scotland), she undertook a PhD (University of Canterbury, Christchurch) to develop new models of collective cell behaviour in wound healing, and calibrate these using experimental data. After completing her PhD in 2015, she spent two years as a postdoc at Manaaki Whenua (a Crown Research Institute for environment and biodiversity) before taking on a Researcher position there. Rachelle’s current research combines modelling theory with data from ecological systems to guide conservation management.
The role of statistics and computing in public and social policy
Corporations are collecting and mining mountains of data to make better consumers of us all, but there are also vast quantities of data being gathered by public organisations for administrative and policy purposes.
The 2020 Ihaka Lecture Series brings together three experts to discuss the challenges and rewards of applying data science to societal issues.
Our thanks to The New Zealand Statistical Association who are our official sponsors for the 2020 Ihaka Lecture Series.
The triumph of the quants?: Model-based poll aggregation for election forecasting
Professor Simon Jackman, Chief Executive Officer at the United States Studies Centre, will examine recent successes and failures of predictive models of election outcomes. Professor Jackman will also discuss trends and discontinuities in the evolution of public opinion over election campaigns, spatial smoothing and pollster biases.
Machine learning for causal inference: Magic elixir or fool’s gold?
Professor Jennifer Hill from New York University will review the conceptual issues involved in understanding causal mechanisms and describe the potential for machine learning to improve our understanding of these mechanisms.
Implementing a machine learning tool to support high-stake decisions in child welfare: A case study in human centred AI (cancelled)
Professor Rhema Vaithianathan, from the Centre for Social Data Analytics at AUT, will reflect on what we can learn about applying data analytics in a trusted way, covering key concepts like consent, transparency, fairness and community voice, and how they can contribute to project success or failure.
Rise of the machine learners: Statistical learning in the computational era
Whether labelled as machine learning, predictive algorithms, statistical learning, or AI, the ability of computers to make real-world decisions is rising every year.
The 2019 Ihaka Lecture Series brought together four experts at the interface of statistics and computer science to discuss how computers do it, and how much we should let them.
Our thanks to The New Zealand Statistical Association who are our official sponsors for the 2019 Ihaka Lecture Series.
Open source Machine Learning @ Waikato
Professor Bernhard Pfahringer from the Machine Learning research group at the University of Waikato discusses open-source Machine Learning software suites. He reflects on their design and their position in the current international Machine Learning landscape.
Deep learning: why is it deep, and what is it learning?
University of Auckland Professor Thomas Lumley discusses the rise of neural networks. He provides insight into how deep convolutional nets are structured and how they can be effective, but also why they are brittle and can fail in remarkably alien ways.
Algorithmic fairness: Examples from predictive models for criminal justice
Dr Kristian Lum from the Human Rights Data Analysis Group discusses the use of predictive models in the criminal justice system. Using examples from predictive policing and recidivism risk assessment she demonstrates how such models could perpetuate and potentially amplify data-encoded biases.
Statistical learning and sparsity
Professor Robert Tibshirani from Stanford University reviews the lasso method for high dimensional supervised learning and discusses some new developments in the area, including the Pliable Lasso, and post-selection inference for understanding the important features.
A thousand words: Visualising statistical data
A picture is worth a thousand words – or perhaps that should be a million numbers. The distillation of data into an honest and compelling graphic is an essential component of modern (data) science.
The 2018 Ihaka Lecture Series displayed the contributions of three experts across different facets of data visualisation.
Myth-busting and apophenia in data visualisation: Is what you see really there?
Plots of data are important tools for observing patterns, but it is easy to imagine patterns that may not exist. Using two protocols the Rorschach and the lineup, Professor Dianne Cook of Monash University describes some simple tools for helping to decide if patterns are real.
Making colour accessible
University of Auckland Associate Professor Paul Murrell investigates the 'BrailleR' package for R and its difficulties with colour. By making a mountain out of that molehill, Paul embarks on a daring Statistical Graphics journey featuring colour spaces, high-performance computing, Te Reo, and XKCD.
Visual trumpery: How charts lie – and how they make us smarter
With facts and truth increasingly under assault, the use of graphs, charts, maps and infographics have become popular in supporting all manner of spin. Identifying information from misinformation is an important skill for any citizen. Alberto Cairo from the University of Miami teaches some guiding principles on how people can become more critical and better-informed readers of charts.
Statistical Computing in the Data Age
Statistics has become essential in the data age. We have an increasing ability to collect vast quantities of data, but often still struggle to make sense of it.
The 2017 Ihaka lectures aimed to highlight the important role that both statistics and computing play in this endeavour.
Expressing yourself with R
Hadley Wickham Chief Scientist at RStudio discusses Expressing yourself with R.
R and data journalism in New Zealand
Harkanwal Singh Data Editor from the New Zealand Herald on the use of R in New Zealand's data journalism landscape.
Interactive visualisation and fast computation of the solution path for convex clustering and biclustering
Genevera Allen, from Dobelman Family Junior Chair and Departments of Statistics and Electrical and Computer Engineering at Rice University, discusses clustering as a fundamental tool for exploratory analysis of big data.
Statistical computing in a (more) static environment
Ross Ihaka Associate Professor in the Department of Statistics at the University of Auckland discusses the spectrum of statistical computing systems from the dynamic to the very static.