Computer Science
Applications for 2023-2024 are now closed.
Background investigation on the use of generative AI for cognitive assistance in Alzheimer's
Supervisors
Vithya Yogarajan
Lynette Tippett
Discipline
School of Computer Science
Project code: SCI065
Project
Alzheimer's patients progressively lose memory functions that can cause behavioural changes that are difficult for them and for those they interact with. In this work, the student will investigate the nature of those losses, and explore what sort of information might be supplied, by a cognitive prosthesis based on an adaptive large language model, to the patient to mitigate the effects of the memory loss. The student will compile a report on the existing relevant literature and any available datasets.
Requirements
Familiarity with coding in Python, the basics of machine learning and AI.
Some experience with Pytorch would be ideal but not necessary.
Can large language models generate Computer-Aided Design code for 3D models?
Supervisors
Trung Nguyen
Discipline
School of Computer Science
Project code: SCI066
Project
Large language models show astonishing capacity in writing various programming languages. Nevertheless, there is still limited use of language models to generate 3D models in the form of CAD commands or other programming languages supported by AutoCAD, although there are some existing CAD datasets. This project aims to use such datasets and investigate how well large language models can generate such code for 3D models given conversational prompts.
Requirements
Familiarity with coding in Python, the basics of machine learning and AI.
Some experience with Pytorch would be ideal but not necessary.
Prototype of LLM based memory prosthesis
Supervisors
Vithya Yogarajan
Lynette Tippett
Discipline
School of Computer Science
Project code: SCI067
Project
In this work, the student will use LLM APIs to prototype a memory prosthesis, which notices when short to medium term memory is necessary to a conversational task and provides a prompt for the use of that information. The task should be designed to resemble one with which people with progressive memory loss might have difficulty but does not need to be specific to this population.
Requirements
Familiarity with coding in Python, the basics of machine learning and AI.
Some experience with Pytorch would be ideal but not necessary.
Study of Adaptive Power-of-Two Quantization for Fixed-point representation in Neural Network Model
Project
In the project, the student is going to study adaptive Power-of-Two Quantization for fixed-point representation in Neural Network Models such as MobileNet, ResNet, etc. The objective is find out the best trade-off between computation complexity and accuracy.
Requirement
Students who are interested in this project should be familiar with C or Python programming.
Integrated AI Training and Inference for the Edge using U250
Project
Training and inference demand massive computer resources that utilize expensive and power-hungry GPUs. In this project, the student is asked to develop a unique, integrated, and efficient training and inference deep learning solution for the edge using the Xilinx U250. The objective of this project is to deploy an integrated training-inference solution with real-time retraining of their model, in parallel to online inference on the same device.
Requirements
Students who are interested in this project should be familiar with C/C++ language and interested in learning the new FPGA tools and related developing platform.
Development of Design Automation Software for the Design of Integrated Circuits
Project
In the project, the student is going to develop the electronic design automation (EDA) software for the design of integrated circuits. Before EDA, integrated circuits were designed by hand, and manually laid out. Recently, most of those design processes are done automatically by the EDA software.
Requirements
Students who are interested in this project should be familiar with C/C++ language and have strong background knowledge in algorithms and data structures (heaps, double linked-list, quick sort, tree and graphs) but NO circuit knowledge is required.
An Ethical Computing Toolkit for New Zealand
Supervisor
Vithya Yogarajan
Discipline
School of Computer Science
Project code: SCI072
Project
Overall aim
The overall aim of this project is to compare and evaluate different ethical computing toolkits based on their suitability for addressing the ethical challenges in machine learning in the context of New Zealand.
Project description
This project will involve a comprehensive analysis and evaluation of ethical computing toolkits to determine their suitability for New Zealand. The student will conduct a thorough literature review to identify relevant frameworks and methodologies used in the field of ethical computing. Based on the literature review, a set of promising toolkits will be selected for further evaluation.
The student will develop evaluation criteria specifically tailored to New Zealand's needs, considering data sovereignty, data gathering, model transparency, explainability, fairness, accountability, and robustness, as well as potential future risks.
To assess the real-world effectiveness of the selected toolkits, a case study will be conducted using publicly available New Zealand data. The case study will involve applying the toolkits to the dataset, evaluating their performance in addressing ethical concerns, and analyzing their limitations and strengths.
Based on the findings, the student will provide recommendations and guidelines for selecting and implementing ethical computing toolkits in New Zealand, taking into account the unique requirements and challenges of the country.
Data
The student will utilise publicly available New Zealand data for the case study. The specific datasets will be determined during the project, considering their relevance to ethical computing and machine learning.
Desired output
The desired output of this project includes:
- Literature review summarising the current state of ethical computing toolkits and their applicability to New Zealand
- Comparative analysis report evaluating the selected toolkits based on their compatibility with New Zealand's requirements, focusing on data sovereignty, data gathering, and potential future risks
- Case study report showcasing the application of the selected toolkits using publicly available New Zealand data, including an analysis of their performance and limitations
- Recommendations and guidelines for selecting and implementing ethical computing toolkits in New Zealand, considering the unique context of the region
Preferred skills
We are looking for a highly motivated student with the following preferred skills:
- Familiarity with ethical considerations in computing and machine learning
- Understanding of machine learning approaches
- Data analysis and Python programming skills
Predicting influenza disease burden in NZ
Supervisor
Steffen Albrecht
Discipline
School of Computer Science
Project code: SCI073
Project
Background
With the opening of international boarders, influenza virus circulation has recurred in New Zealand (see Figure). Anecdotal evidence from practicing clinicians in several hospitals in New Zealand is that influenza and influenza-related hospitalisations have been more complicated in 2022, with significant proportions of children presenting with complicated lower respiratory infections, and with non-respiratory presentations and complications.
Figure. Influenza virus isolates reported in New Zealand all ages, week 1 to week 23 2022
Research question
Using machine learning, we are wanting to answer the question, “What is the epidemiology and clinical spectrum of influenza hospitalisations among children in New Zealand in 2022?”
Data availability
We have a number of medical data sources available to us that will help us address the question.
Desired outputs
Machine learning models that determine the factors that influence influenza hospitalisations amongst children in NZ.
Preferred skills
We are looking for a highly motivated student with the following skills:
- Understanding of machine learning approaches
- Data analysis and Python programming skills
VR.net: Curating a large-scale real-world dataset for VR motion sickness research
Project
VR gaming has gained widespread popularity in recent years, with the annual market revenue projected to reach 87$ billion by 2023. However, up to 40% of users suffer from VR motion sickness with symptoms like fatigue, disorientation, and nausea.
Recently, researchers have proposed using Machine Learning (ML) approaches to identify motion sickness risk factors in VR content. However, many of these studies report the need for more training datasets. These researchers demand a large-scale dataset containing many hours of VR gameplay clips and the corresponding risk factor labels. The video clips should also come from diverse real-world game genres to ensure generalisation. Building such a dataset is challenging since manual labeling would require an infeasible amount of time.
In this project, you are tasked to build an automatic data collection tool to extract labeled data from real-world VR games. The data may include gameplay video, 3D object/camera movement, and VR headset/joystick movement. You will use a software engineering technique named code instrumentation, where a piece of custom code is dynamically injected into low-level system graphics stacks to intercept any valuable data.
You will test the data collection tool by playing various real-world VR games (e.g., Beat Saber or Epic Roller Coaster). We will provide two sets of VR goggles and game copies. You can enjoy the games as long as you like. After the gameplay, you are also encouraged to show the utility of the collected dataset by building a simple machine-learning model. For instance, given a one-second gameplay video, predict whether the camera is doing multi-axis rotation.
Requirements
1) Experience with Python and C++ Programming
2) Solid knowledge in operating systems
3) Comfort with evaluating VR programs
Solving Scalability challenges for a Green Computing Hub
Project
This project is about exploring how scalability challenges can be solved in a way that fits to the mission of a green computing hub.
Requirements
Skills include database and web application experience and the openness to explore and evaluate new cloud computing solutions.
AWS deployment of a mobile App for diabetes type 2 patients
Project
This project focuses on the deployment of an existing software implementation onto AWS for the purpose of showcase demonstration. The current mobile and server-side applications were developed by two groups of students in the past. During the deployment process, extensions on the current software (code) would be required. The project also requires an evaluation of the outcome.
Requirements
The required skill set includes mobile application development and AWS deployment experience, etc. The duration of the project would be 10 weeks.
Solving Scalability challenges for a Green Computing Hub
Project
In this project, you will develop comprehensive teaching materials to educate students on ethical considerations when developing and using AI tools (e.g., ChatGPT) and models.
Your task is to create a curriculum that introduces students to the potential misuse of AI tools and models for malevolent and/or unethical purposes. The teaching materials should cover key ethical principles, explore real-world case studies, and provide practical guidelines and strategies to prevent and mitigate misuse. Deliverables may include slides which could be used in lectures, as well as demonstrations in the form of a website, Jupyter Notebooks, or any other innovative medium you think would be relevant to the task.
You will work in collaboration with members of the Ethical Computing project within the School of Computer Science.
You work will empower future software developers to proactively consider the ethical implications of their work and contribute to the responsible and accountable development of AI technologies.
It will also empower AI users with a better understanding of the ethical implications of using AI technologies.
Theoretical foundations of machine learning
Supervisors
Jesse Goodman (Statistics)
Pedram Hekmati (Mathematics)
Simone Linz (Computer Science)
Discipline
School of Computer Science
Mathematics
Statistics
Project code: SCI078
Project
Machine learning and, more broadly, artificial intelligence will continue to change everyone’s life profoundly. Mathematics, statistics, and computer science play an important role in advancing machine learning algorithms (e.g., to make algorithms more reliable) and theoretical research into machine learning can advance our understanding into why certain methods are successful or not.
In this project, you will study some theoretical foundations of machine learning algorithms and how techniques from probability theory, geometry, and graph theory can be leveraged to aid the design of machine learning algorithms.
The exact direction of the project will be decided at its start and depend on the interests and experience of the summer student.
Improve functionality and performance of assignment automarker used in algorithms classes
Project
We need toi add some new features to one of the automated marking platforms that we use in computer science algorithms courses.
We want to fix a few operational issues and adapt with improved performance (sequential to parallel) for test/exam environments.
Requirements
You will need to be fluent (or learn quickly) in linux, docker, java, php and possibly javascript.
Test/exam generator for real data/computer scientists
Project
Using an expandable database of algorithm questions, we want to utilise the generation of either a fixed exam or set of individual exams. Statistics from marking scripts to support ranking of good/bad questions.
Requirements
Knowledge of XML/SQL and LaTeX. Python (or possibly Java) as development language. Initially command-line tools are desired but could have GUI interface, if time permits.
Pose Detection using Machine Learning
Project
Many people perform physical exercises, such as Yoga, at home. It is crucial to execute these exercises correctly. Incorrect execution can render the exercises ineffective and may even result in bodily harm. Recognizing that not everyone has a personal trainer, we aim in this project to develop a machine learning model that can automatically determine whether people are performing exercises correctly by analysing their poses during the exercise. We will use Yoga as an example exercise in this project.
Requirements and skills gained
Proficiency in Python programming is essential for the project. While working on this project, the student is expected to acquire knowledge in building neural networks and using existing deep learning networks, such as vision/video transformers and others.
Scanning Teleform Sheets
Supervisors
Patrice Delmas
Discipline
School of Computer Science
Project code: SCI082
Project
Our university uses Scantron Teleform sheets for examinations and tests where students answer multiple choice questions by shading a bubble in their Teleform sheet. The sheets are subsequently scanned by a machine and the answer choices are processed for grading.
Unfortunately, the scanning centre becomes a hotspot during examinations and this becomes a bottleneck in processing and releasing grades.
In this project, you would work on recognizing the answer choices in scanned PDF Teleform sheets (of the format that UoA uses) and provide a text file that encodes the answers in each PDF Teleform. A command-line application (CLI) is all that is required.
A similar project catering for a Teleform sheet of a different type exists on GitHub: https://github.com/floft/freetron#readme You may wish to use this as the starting point.
A C# implementation will be required.
Scale up Sparse Neuron Network with Randomised Hashing
Project
Sparse neural networks have gained considerable attention due to their potential to reduce computational complexity and energy consumption in machine learning tasks. However, achieving scalability while maintaining high performance remains a challenge. This research proposal aims to investigate the integration of randomised hashing techniques into sparse neural networks to enable effective scaling, improved performance, and increased efficiency.
Objectives
- Develop a novel framework for scaling up sparse neural networks using randomized hashing techniques.
- Investigate the impact of randomized hashing on the performance of sparse networks, focusing on accuracy, training convergence, and computational efficiency.
- Optimize the training process of scaled-up sparse networks with randomized hashing to achieve competitive performance with dense networks.
- Evaluate the trade-offs between performance, sparsity, and computational costs in scaled-up sparse networks with GPUs.
Skills
TensorFlow, Pytorch, CUDA-GPU, locality-sensitive hashing, random projection
Reference: Paper with Code
1. https://github.com/zahraatashgahi/CTRE
2. https://github.com/rdspring1/LSH_DeepLearning
Characterisation and Analysis of Reddit Financial Communities
Project
The surge in prices of meme stocks such as Gamestop and AMC initiated by regular investor members of the Wallstreetbets Reddit community has highlighted the power of social media. This project will characterize the usage of the financial subreddit by using web scraping techniques. Basic characteristics such as a number of posts, comments, and users will be measured.
We will also study a variety of features such as the density of author posts and timeline of posts and comments over the period of existence of the subreddit. We will also analyze the stocks that have been discussed by the community and its impact on the stock price. Analysis of semantics, NLP, and sentiment analysis may also be required. Data needs to be scraped from Reddit.
Requirements
Abilities in Python or similar programming, basic statistical analysis, web scraping, natural language processing
Characterising NFT Marketplaces
Project
The aim of this measurement and characterization study is to understand the market and user dynamics in the NFT marketplace. The data will be collected from one of the larger marketplaces such as Opensea or niche marketplaces focusing on art or a specific collectible. The objective is to perform a longitudinal study with the data spanning the past 4 years since NFT have gained traction in the mainstream.
You will perform a comprehensive characterization study. You will study various aspects of the marketplace such as user behaviour, popularity, market dynamics, economic factors, network science, among other things in the marketplace. Not much academic research has been conducted on this topic, hence there is lot of scope to do interesting analysis.
Requirements
Abilities in Python or similar programming, basic statistical analysis, web scraping, visualization, network science, ML.
Characterizing Decentralized Social Blockchain (DeSo) and Decentralized Finance (DeFi) sites
Project
The aim of this measurement and characterization study is to understand the emerging DeSo applications. One such application is BitClout, which is a Crypto Social network similar to Twitter. The data will be collected from the DeSo platforms to better understand the design, implementation, usage, and dynamics of the platforms. Similar analysis can then be applied to DeFi systems to better understand their salient features.
Requirements
Strong programming skills, a good background in statistics, background in Blockchain. Network Science (Social Networks/Graph Structures) is a plus. Excellent writing skills.
Cybersecurity using machine learning in IoT and digital twins
Project
We will study the use of Machine Learning algorithms such as SVM, Decision Tree, and Random Forest in detecting denial of service attacks in IoT networks. This project will focus on the software-defined network paradigm with a central viewpoint. The analysis can be performed via simulation such as Mininet or on existing freely available datasets. The objective is to identify features that can classify the attacks and also compare the performance of various machine learning techniques. Deep learning can be used for detecting attacks and their performance can be analyzed.
Requirements
Good programming skills, TCP/IP, Mininet or similar simulator, machine learning
Machine learning for social good in environment: Developing an advanced warning system for predicting extreme weather events
Supervisors
Gillian Dobbie
Daniel Wilson
Centre of Machine Learning for Social Good
Discipline
School of Computer Science
Project code: SCI088
Project
Floods, the most prevalent of natural disasters, impact over 250 million people annually, leading to economic damages of approximately $10 billion. Our project aims to address this pressing issue by developing an advanced warning system that leverages multi-modality generative machine learning.
By investigating the potential of extreme weather events, particularly flooding, and integrating valuable data on green, grey, and blue infrastructure, we strive to equip policymakers with the necessary tools to make informed decisions. Our ultimate goal is to empower organizations and individuals, enabling them to take proactive measures to mitigate damage and save lives.
Be a part of the Centre of Machine Learning for Social Good: Join our dynamic team. We are committed to advancing fundamental knowledge in machine learning and data analytics while tackling the most challenging health, environmental, and societal problems of our time.
As the first center in Aotearoa dedicated to utilizing machine learning for social good, we collaborate closely with domain experts, leveraging their expertise as a catalyst to address high-impact societal issues.
By participating in this project, you will contribute to the development of a prototype for an open-sourced early event warning system. This system will revolutionize the way extreme weather events, such as floods, are predicted and managed. Your role will involve engaging in meaningful discussions with our collaborators to refine the system's design and functionality, ensuring its effectiveness and usability.
Project Output
Prototype of an Open-Sourced Early Event Warning System
Requirements
To excel in this project, proficiency in Python programming, including Keras or Pytorch, and a strong understanding of machine learning fundamentals are essential. Familiarity with large-language models is considered advantageous. We are seeking individuals who possess a deep passion for creating impactful outcomes that positively influence society.
Machine Learning for Social Good in Health: Automated Machine Learning for early prediction of acute pancreatitis severity
Supervisors
Yun Sing Koh
Daniel Wilson
Centre of Machine Learning for Social Good
Discipline
School of Computer Science
Project code: SCI089
Project
Acute pancreatitis is a complex condition with varying degrees of severity, and accurate prediction plays a crucial role in guiding timely interventions and improving patient outcomes.
Our project focuses on developing an automated machine learning system capable of predicting the severity of acute pancreatitis at an early stage. Early identification of severe cases can lead to timely interventions and improved patient management. Previous approaches use traditional neural networks, going beyond, we will investigate current state-of-the-art machine learning approaches.
Be a part of the Centre of Machine Learning for Social Good: Join our dynamic team. We are committed to advancing fundamental knowledge in machine learning and data analytics while tackling the most challenging health, environmental, and societal problems of our time. As the first center in Aotearoa dedicated to utilizing machine learning for social good, we collaborate closely with domain experts, leveraging their expertise as a catalyst to address high-impact societal issues.
Output
The automated prediction system has the potential to enable healthcare professionals to intervene early, allocate resources effectively, and provide personalized treatment plans to acute pancreatitis patients.
Requirements
- Strong programming skills, particularly in Python, and familiarity with machine learning libraries (e.g., scikit-learn, TensorFlow)
- Basic understanding of machine learning concepts and algorithms
- Ability to work independently and collaboratively in a research team
- Attention to detail and analytical mindset
- Passion for improving healthcare outcomes through technology
Skills developed
By participating in this project, you will have the opportunity to work alongside experts in the field, gain valuable research experience, and contribute to advancing the field of healthcare analytics. This project combines the power of machine learning algorithms, medical data, and clinical expertise to create a robust and efficient prediction model.
Machine Learning for social good in a New Zealand: Current and future
Supervisors
Gillian Dobbie
Daniel Wilson
Centre of Machine Learning for Social Good
Discipline
School of Computer Science
Project code: SCI090
Project
This project aims to explore, compare, contrast and design engagement approaches for transdisciplinary social good projects specifically tailored to the unique context of New Zealand. This project offers a valuable opportunity to delve into the field of ML for Social Good, culminating in a comprehensive study on current and future approaches for utilizing machine learning to address societal challenges.
Our project focuses on examining how machine learning can be effectively designed and applied to address pressing social issues in New Zealand. We will explore existing Machine Learning for social good approaches and identify areas where tailored solutions can have the most impact, and how the tools build can be sustainable beyond the life of the projects. The culmination of this project will be a comprehensive study that outlines current approaches, evaluates their efficacy, and provides insights into future directions.
New Zealand Context
By working on ML for Social Good within the unique context of New Zealand, you will have the opportunity to understand the specific challenges and needs of local communities. Through collaborations with experts and stakeholders, you will contribute to designing machine learning solutions that are relevant, culturally sensitive, and effective in addressing social issues specific to New Zealand.
Output
Study on ML for Social Good Approaches: The ultimate outcome of this project will be a comprehensive study that investigates and documents the current landscape of ML for Social Good approaches, with a specific focus on New Zealand. The study will delve into the effectiveness, challenges, and future prospects of these processes and approaches, providing valuable insights for researchers, policymakers, and practitioners.
Be a part of the Centre of Machine Learning for Social Good. We are committed to advancing fundamental knowledge in machine learning and data analytics while tackling the most challenging health, environmental, and societal problems of our time. As the first center in Aotearoa dedicated to utilizing machine learning for social good, we collaborate closely with domain experts, leveraging their expertise as a catalyst to address high-impact societal issues.
Requirements
- Strong research and analytical skills
- Familiarity with machine learning concepts and algorithms
- Proficiency in data analysis and report writing
- Ability to work independently and collaborate in a research team
- Passion for utilizing technology for social good.
Using AI to predict behaviour in team sports
Project
The use of AI for team player behaviour analysis during games using video recording and data analytics (such as body sensors) from sports channels has the potential to improve players' and teams performance, along with other applications for the viewers.
The student will first provide a brief overview of the state-of-the-art both for behavioral analysis in team sports and on existing annotated datasets.
Leveraging the above and using our professional sports and broadcasting partners' expertise and datasets, the student will trial best existing machine learning techniques for individual behaviour tracking and, depending on progress, will attempt to link this behaviour to recorded game events (as provided by our sports team partner) such as fouls, scoring, injury and so on. The sports studied will be one or more of the following based on available datasets and task complexity: basketball, netball, rugby league, soccer, rugby union.
Requirements
Strong motivation and a willingness to learn. Some Python programming capabilities and the ability to pick-up existing tools and knowledge. Computer vision knowledge is not essential although the candidate will need to pick-up relevant skills in computer vision and machine learning along the way.
Full-body interaction with AI in a Virtual Reality installation
Supervisors
Dr Becca Weber
Discipline
School of Computer Science
Dance Studies
Project code: SCI092
Project
This project explores the future of interacting with AI – full-body interaction with AI agents within an immersive environment where users experience responsive audio-visual feedback based on real-time body tracking. The student will integrate AI agents into a Unity code base for a participative installation. The research goal is to understand how the AI agents and effects impact sensory perception, embodiment, and subjective experiences.
Over the summer, we will work with dance experts to iteratively develop the AI agents and interaction. This summer research project will contribute to an installation that will be made public. It is related to a larger project that is likely to lead to topics of masters and PhD studies and to collaborations with other researchers in universities abroad.
Augmented reality stroke rehabilitation game from te whare tapa whā
Project
We reconceptualise healthtech for elders from the Māori model 'te whare tapa whā’ (the four cornerstones of health; Durie, 1994): it builds on iwi involvement and concurrently supports physical, mental, whānau (family) and spiritual health with interactive activities and experiences. Our team includes a partnership with a Māori augmented reality development company ARA where we iteratively codesigned the first version of the software with Māori communities. Together, we will interact directly with kaumatua (elders) to better understand their experiences and motivation to engage in their rehabilitation. We will collaborate with researchers at Auckland Hospital to prepare a feasibility study of the prototype.
Automatic assessment of accessibility, visual design, and interactivity of websites
Project
Web technologies are foundational and continue to be widespread, with front-end development skills in high demand. This project pursues the automatic assessment of web aspects dynamically, by executing them as would occur within a typical browser using the Selenium WebDriver framework. In this project you will write custom code to assess and interact with web components through browser-specific drivers, expanding functionality to assess visual Gestalt Principles and interactivity in programmatic fashion. Selenium enables the remote control of a browser and mimics user actions on the browser including button click, drag, and drop selection, checkboxes, key presses, taps, and scrolling. The use of this tool is educational to support increased understanding of accessibility guidelines and visual design skills.
AI for Climate Change: Automated detection of urchin barren from underwater imagery
Supervisor
Co-supervisors
Patrice Delams, Arie Spyksma (Institute of Marine Science)
Discipline
School of Computer Science
Project code: SCI095
Project
Kelp forests are among the most productive ecosystems on Earth, but climate-driven impacts are causing wide-spread kelp habitat loss. For example, the climate-driven proliferation of the longspined sea urchin is one of the most urgent threats to kelp forests in south-eastern Australia and north-eastern New Zealand.
Assessing this threat requires collection and analysis (typically manually) of underwater imagery spanning tens to hundreds of kilometres of reef. The high contrast of sea urchins on barren reef makes this an ideal candidate for modern computer vision solutions based on machine learning (ML) algorithms to dramatically improve annotation and analysis.
Using existing image-based monitoring data you will develop and test ML algorithms to detect the presence and the extent of urchin barren expansion in Australia/New Zealand.
Recommended skills
This project is suitable for students with basic skills in maths, statistics, machine learning and image analysis; intermediate programming skills in Python; familiarity with convolutional neural networks and programming experience in Pytorch will be beneficial (but it is not necessary and can be learned while working on the project).
Machine learning for mass spectrometry data analysis
Supervisor
Co-supervisors
Patrice Delams, Arie Spyksma (Institute of Marine Science)
Discipline
School of Computer Science
Project code: SCI096
Project
As new mass spectrometry (MS) technologies are rapidly developed to cope with the complexity of biological samples emerging in environmental and biomedical sciences, standard tools for MS analysis fail to exploit the full data potential offered by the recent technologies. For example, top-down tandem MS has been extremely useful in studying metal-protein interactions and relevant for development of anti-cancer metal-based drugs. Manual identification of binding sites of metal-based drugs is extremely difficult, prone to error, and often only the most intense peaks get assigned. Nevertheless, it is the common approach in absence of effective automated methods for the given problem.
New computational methods are therefore needed, and machine learning ML algorithms would be particularly valuable to cope with the complexity, noise, and volume of MS data. More specifically, the overall problem resembles challenges in ML for time series analysis. You will investigate the use of time series data analysis techniques to match and identify specific peak patterns in MS data.
Recommended skills
Basic knowledge of machine learning, good knowledge of Python, openness to learn about mass spectrometry data and to collaborate with chemists.
Reliable machine learning for predator identification
Supervisor
Co-supervisors
Patrice Delams, Arie Spyksma (Institute of Marine Science)
Discipline
School of Computer Science
Project code: SCI097
Project
Large datasets are now routinely collected from digital cameras and other sensing technologies that need to be integrated and analyzed in an efficient and intelligent way in order to address biosecurity problems (such as predator identification) with success at operational scale. Recent advances in low-cost sensing technology, computer vision and deep learning methodology have enabled new opportunities for developing zero tolerance technology for predator monitoring and trapping in large forested and complex environments.
While deep learning models have seen enormous success in computer vision due to their high expressiveness compared to traditional shallow models, they don’t have well-motivated methods for accurately estimating their confidence in a prediction. They can be “overconfident” for images that humans clearly will rule out as not relevant for the prediction task or not even including the object of interest.
This project will investigate methods for quantifying uncertainty in deep learning model predictions. The goal is to develop actionable deep machine learning models, safe to deploy in the real-world applications that need reliable detection of predators in sensing (image-based) data.
Recommended skills
Basic knowledge of machine learning, good knowledge of Python, essential understanding of deep learning networks beneficial, and programming experience with Pytorch (TensorFlow or Keras).
Auditing Artificial Intelligence with Adversarial Learning
Project
We aim to design and develop new methods to attack machine learning models and use the adversarial attacks to define a measure of reliability. Weak performances of models where data sets are not representative or flaws in training process are a common issue in Machine Learning. This leads to misclassification and unfairness of the model. We will develop a framework that identifies adversarial regions in the data space that are prone to make models fail. The framework will not only identify these regions and data, but also produce tools to improve it, and return a score that reflects the reliability of the model. This score can be used to certify models without having access to the training process and estimate the applicability of models to specific use cases.
Recommended skills
Basic knowledge of machine learning and python.
Predicting Persistence of Environmental Pollutants
Project
Most chemicals that are currently produced sooner or later end up in the environment, many of them in rivers and other waters. It is essential to know their fate in terms of transformations and persistence. Harmful chemicals that degrade quickly might pose no big thread to the environment, however persistent toxic compounds can have lasting negative impact. We will go beyond the prediction of specific biodegradation products as done in state-of-the-art metabolic prediction systems (such as enviPath) and aim to predict reaction rates, that is how long pollutants and their metabolites persist in the environment. We will develop and train machine learning models that use data on metabolic reactions under certain environmental conditions and aim to predict reaction rates and the half-life of compounds.
Recommended skills
Basic knowledge of chemistry, machine learning, and python.
Design for Degradability – In-Silico Development of Sustainable Chemicals
Project
An important aspect in the development of novel chemicals is their environmental fate, that is their ability to degrade when released in the environment. To achieve this, the goal is to design compounds that fulfill a certain function - for example medication or pesticides, and at the same time allow for quick degradation into harmless metabolites. We will develop new algorithms that achieve this, evaluating on large databases of existing compounds. We will use standard machine learning models for predicting degradation products and pathways (see enviPath ). Our approach will be to start with existing compounds, and transform them using adversarial methods and generative models (GANs) such that their degradability increases while at the same time keeping their original function.
Recommended skills
Basic knowledge of chemistry, machine learning, and python.
Adversarial Time Series
Project
Adversarial Machine Learning is a field of Machine Learning that focuses on exploiting model vulnerabilities by making use of obtainable information from the model. Studying a model’s weaknesses to adversarial attacks not only helps the researcher understand more about the model itself, but also allows them to defend against malicious attacks and prevent potentially fatal consequences after deployment. Adversarial Machine Learning was firstly proposed in the image classification domain, where an attack fools a model to misclassify an image by adding carefully crafted noise that is hardly detectable by a human. Recently, adversarial methods have been introduced that target time series challenges. We will develop and evaluate new adversarial attacks on time series, targeting specific time series challenges beyond forecasting.
Recommended skills
Basic knowledge of chemistry, machine learning, and python.
Do Neural Networks Pay Off?
Project
For a while now, we have seen the trend that neural networks are vastly popular, and a large portion of the machine learning research is dedicated to achieving minor gains in accuracy at huge power costs. We hypothesize that, given the same love and care (in terms of nifty pre-processing strategies etc.), traditional machine learning methods have the potential to achieve a similar accuracy while consuming less power. A few questions we are interested in are the following:
- Using the same pre-processing techniques, can traditional techniques achieve a similar performance? For which type of dataset does it work?
- There is a trade-off between the accuracy and the number of parameters or layers (as a proxy for the power consumption), and we can expect the last bit of accuracy to be the costliest. Can we find a more sustainable way to stop at a point where we sacrifice a little accuracy to save power?
- If we compare traditional ML methods to NNs while allowing the same number of parameters, what do we observe?
- There is a myth that only NNs can perform well on certain types of data (such as images). Can we transfer the special tricks NNs use on this data type to traditional ML methods?
Recommended skills
Basic understanding of machine learning and Python.
Image compression to support image processing
Project
This project aims to investigate the potential benefits of using our newly developed image compression technique, based on multivariate trees, to enhance image processing machine learning models. The objective is to explore whether employing this technique can lead to faster and more efficient training of these models, requiring fewer iterations, layers, and parameters. While previous research has shown improvements using Superpixels, our approach offers a substantially more lightweight and simplistic solution, reducing storage requirements while maintaining performance. Through this project, the student will conduct empirical evaluations, comparing the performance of models trained on compressed images versus uncompressed ones, and analyze the impact on training time, convergence rate, and model accuracy.
Recommended skills
Basic understanding of machine learning and Python.
A Machine Learning-based news recommender
Project
Service providers aim to offer an excellent experience to their customers by prefetching data likely to be accessed by them and storing it at locations close to the users. In this project, we aim to investigate the use of deep reinforcement learning in predicting customer behaviour and aiding service providers in identifying the data relevant for prefetching.
Requirements
Proficiency in Python programming is essential for the project. While working on this project, the student is expected to acquire knowledge in building neural networks, understanding reinforcement learning, and using existing deep learning networks, such as transformers.
Computing basin boundaries in Julia
Supervisors
Claire Postlethwaite
Matthew Egbert
Discipline
School of Computer Science
Mathematics
Project code: SCI131
Project
This project will involve using the scientific computing language Julia and implementing (previously developed) software to compute basins of attractions of various attractors such as equilibria and periodic orbits in systems which exhibit bi-stability. You will further examine how the boundaries of the basin change as parameters are varied.
Prerequisites: MATHS 260, and at least some programming experience.