Local AI Service

Dedicated high-performance Dell GB10 (GPU) hardware for local LLM inference and testing.

On this page:

Benefits
Using LAIS
Contact

The Local AI Service (LAIS) provides researchers with dedicated access to Dell Pro Max GB10 devices. These high-performance units bridge the gap between standard desktop or laptop computers and large-scale GPU clusters, allowing for the local execution and testing of massive Large Language Models (LLMs).

LAIS devices can be booked by research staff and doctoral students for project-specific purposes. The initial bookable period is one week, which can be extended weekly for up to four weeks. Longer periods are available by arrangement. Contact us to test and inform the design of this evolving service.

Benefits

Data and Large Language Models are stored on-site at the University.
Fast access to a high-performance environment for testing and prototyping.
Dedicated access to the booked device.
Standard service is free.

Using LAIS

LAIS supports several types of research activity, including:

Running LLMs locally that cannot fit on standard 24GB or 48GB GPUs.
Testing multi-agent systems – architectures where an ‘orchestrator’ model and multiple ‘worker’ models must coexist in the same high-speed memory pool.
Using the dual-node setup to practice Tensor Parallelism or Pipeline Parallelism before scaling up to a full cluster.
Evaluating the performance and appropriateness of this hardware ahead of a potential purchase.

If you are unsure whether LAIS will fit your needs, talk to the team. We have a range of research compute services to fit different contexts.

Access

Access follows a consultation and on-boarding.
Available to academic staff and doctoral candidates. Postgraduate students and other groups need their supervisor or principal investigator to request LAIS.
Project users will be set up to have SSH access to the booked device.
Using LAIS requires researchers to be comfortable using the command line and terminal interfaces.
Devices reside within the Centre for eResearch (on-campus) and will be accessed remotely.
Reservations are for one week with extensions up to four weeks.
Devices are reset (models and data cleared) between project groups.

Device specifications

Single-unit mode: Provides 1 Petaflop (FP4) of compute and 128GB Unified Memory. This is best suited for hosting LLMs around 200B parameters.
Dual-unit mode: Provides 2 Petaflops (FP4) of compute and 256GB Unified Memory. This is best suited for hosting LLMs around 400B parameters.

Devices are pre-configured with the NVIDIA AI Enterprise software stack. This includes standard drivers, CUDA libraries, and support for JupyterLab and Docker.

Contact

Research Data Support Services
Email: researchdata@auckland.ac.nz