A summer of machine learning and data sovereignty

Dion Wharerau spent the summer as an intern with Te Hiku Media, working to improve their automatic speech recognition model for te reo Māori.

Dion Wharerau poses in front of a Te Pūnaha Matatini banner.
Dion Wharerau spent the summer as an intern with Te Hiku Media.

Dion Wharerau has enjoyed maths since his first years at Kaikohe West School. His teachers supported him by sending him to older classes during maths time, and says that really reinforced his enjoyment of it.

These days he is studying for a Bachelor of Science in Computer Science at the University of Auckland. He says that programming is the perfect combination of problem solving and creativity for him. Dion continues to enjoy studying maths at university, and is disappointed that he won’t be able to fit in all the courses that he is interested in before he graduates.

Last year Dion heard about a summer internship with Te Hiku Media and Te Pūnaha Matatini through the Computer Science Tuākana programme.

Te Pūnaha Matatini is a Centre of Research Excellence in complex systems, hosted by the University of Auckland. Te Hiku is a charitable media organisation, collectively belonging to the Far North iwi of Ngāti Kuri, Te Aupouri, Ngai Takoto, Te Rārawa and Ngāti Kahu.

Māori language revitalisation is a core focus of Te Hiku, and they are working to enable a sovereign digital future for Indigenous languages. One of their key projects is the Papa Reo natural language processing platform.

Data sovereignty was something I’d never thought about before, because I’d never really worked in a real-life situation that involved other people’s data.

Dion Wharerau

“I saw the internship and applied for it straight away! I remember going through Te Pūnaha Matatini’s website and really liking everything I read – about problem solving and complexity.”

"I liked the slogan: Complexity is at our heart."

Dion spent the summer of 2021-2022 working with the Papa Reo team to apply DeepSpeech augmentations to their automatic speech recognition model for te reo Māori.

"My project was to increase the robustness of the machine learning model," says Dion. "I worked with a lot of amazing people, and I learned a lot along the way. I made lots of mistakes, and I worked with some really amazing software."

"A big mistake that I corrected early on was asking for help a lot more often. Everyone at Te Hiku was incredibly helpful! Once I started asking for help, things got sorted immediately."

Central to the natural language processing work that Te Hiku does is a staunch belief that each community must maintain control and sovereignty of their data.

Learning about data sovereignty was new territory for Dion. "Data sovereignty was something I’d never thought about before, because I’d never really worked in a real-life situation that involved other people’s data."

"I’m incredibly grateful to Te Hiku and Te Pūnaha Matatini, because my internship was a great experience."