Māori speech emotion recognition technology in development

A pioneering system is being created to identify emotions in te reo Māori speech.

Himashi Rathnayake sits at a computer desk with waveforms
Himashi Rathnayake, originally from Sri Lanka, says her interest in under-resourced languages began with her own. Photo: William Chea

A University of Auckland doctoral student is helping develop the world’s first speech emotion recognition system designed specifically for te reo Māori. This technology aims not only to understand spoken words, but also to accurately interpret the emotions behind them.

Himashi Rathnayake, a PhD candidate in the Department of Electrical, Computer and Software Engineering, is part of a research team creating speech technology rooted in Māori cultural understandings of emotion.

The project is being co-developed by Rathnayake alongside Dr Jesin James, Dr Ake Nicholas (Ngāti Te'akatauira, Ngā Pū Toru), Dr Gianna Leoni (Ngāti Kura, Ngāi Takoto, Te Aupōuri), Professor Catherine Watson, and Associate Professor Peter Keegan (Waikato-Maniapoto, Ngāti Porou). The research is also supported by Te Reo Irirangi o Te Hiku o te Ika (Te Hiku Media) and Science for Technological Innovation.

Speech emotion recognition has been widely studied in computing, but only for a small fraction of the world’s more than 7,000 languages. This is the first time such research has been conducted for te reo Māori.

Members of the research team behind the project.
Professor Catherin Watson, Himashi Rathnayake, and Dr Jesin James. Photo: William Chea

Rathnayake, originally from Sri Lanka, says her interest in under-resourced languages began with her own.

“My native language, Sinhala, is also quite underrepresented in technology development,” she says.

“During my studies in Sri Lanka, I learned about artificial intelligence and noticed that most AI technologies only focus on a few major languages. Others, like Sinhala and New Zealand’s Indigenous language, te reo Māori, were often left out. That inspired me to find solutions that help make AI more inclusive for everyone.”

The research team worked with members of the Māori community, who reviewed Māori community media provided by Te Hiku Media and completed questionnaires identifying the emotions they perceived.

From this process, the team gathered over 200 emotion-related words, which were later refined into 16 key categories with support from focus group discussions conducted with Māori speakers. Several of these categories were unique to te reo Māori.

Pōuri, for example, may convey sadness, mourning, darkness or remorse; harikoa might express happiness, delight or interest; and hōhā can indicate irritation, boredom or fatigue. Other kupu (words) included haumaru, pai and kaikā.

Dr Jesin James says the research was guided by a community-first approach.

“The researchers envision that technology development for a community should be grounded in the community, carried out in partnership with the community, and ultimately serve the community’s interests,” she says.

“This approach ensures the protection and maintenance of the community’s data sovereignty.”

Dr Jesin James
Dr Jesin James is a co-developer of the speech emotion recognition technology for te reo Māori. Photo: William Chea

Rathnayake says it was essential to question whether the emotion categories commonly used in Western systems – like happy, sad or angry – were truly universal. Overlooking cultural nuances in the design of technology, she says, can have real consequences.

“If a system thinks you’re angry when you’re not, and if it’s because of your language or culture or where you come from, it’s not just inaccurate, it’s unfair.”

The project has been underway for two years and is now entering the final stages of the research phase. Development of the technology itself is set to begin soon.

“These days we’re working with Māori voice actors to record emotional expressions based on the categories we’ve identified,” she says.

“Then we’ll do some acoustic analysis of those expressions and use the recorded database to train a machine learning model.”

The resulting system will be trained to recognise emotional tone in Māori speech using that culturally informed dataset. It’s expected to be completed in about 18 months and will be made publicly available.

“This work isn’t just for Aotearoa. It’s a call to technology developers around the world to build technology that’s not only smart, but also culturally sensitive,” says Rathnayake.

“When machines learn to hear us, they should hear all of us.”

Media contact

Jogai Bhatt | Media adviser
M:
 027 285 9464
E: jogai.bhatt@auckland.ac.nz