Blindspots in AI vision revealed in study

19 February 2026

What if you took ChatGPT to an eye clinic? These scientists did.

Dr Katherine Storrs and PhD student Gene Tangtartharakul

University of Auckland scientists put commercial AI systems – ChatGPT, Claude, Gemini – through clinical tests to investigate how close they are to achieving human-level vision.

“The AIs were excellent at recognising objects, but bad at performing much simpler-seeming tasks such as comparing the lengths of lines or comparing unfamiliar shapes,” says Dr Katherine Storrs, the senior author of the study.

She runs the Computational Perception Lab in the School of Psychology.

There may be implications for scientists using the AIs as models of human vision and in robotics where detailed spatial understanding, not just object recognition, is crucial, says Storrs.

The images here are an outline of a banana and a silhouette of a bear. The AI tools recognised images such as these despite the lack of detail. — Here's the easy stuff. The AI tools recognise images such as the banana and the bear despite the lack of detail. Their training has equipped them to pick up on the shapes.

The scientists put the AI systems through tests a neurologist might use with a stroke patient.

Artificial vision is doing amazing things such as guiding self-driving cars and robots and recognising faces more accurately than humans.

“While these innovations have gotten really good, none of them requires the full range of human visual abilities,” says Gene Tangtartharakul, the PhD student who led the research.

The images are ones that the AI systems struggle with: Recognising which of the bottom two blobby shapes matches the top one and deciphering the overlapping shapes. — Here's where the AI systems struggled: Recognising which of the bottom two blobby shapes matched the top one and, on the right, deciphering the overlapping shapes.

Sometimes we need to judge the orientation of a single edge (is this picture hung on the wall perfectly level?); sometimes disentangle a cacophony of overlapping shapes (is there something hiding inside that bush?); and sometimes recognise semantic categories (is this painting by Cezanne or Picasso?).

Most of the AI tools people know, such as ChatGPT, Claude and Gemini, are both Large Language Models and Visual-Language Models; you can type into the chat box, but also show a photo and ask questions about it.

The images are of pictures used to test AI models. The AI models performed poorly in assessing whether the gaps in two circles were in the same place, but well in assessing vague drawings of spoons and a fork. — The AI models performed poorly in assessing whether the gaps in the circles were in the same place, but they were good at matching the two spoons, despite the lack of detail.

“They are great at recognising familiar objects, but poor at tasks like making fine-grained judgements about particular parts of an image, or comparing unfamiliar shapes,” says Tang.

The VLMs reached the clinical threshold for visual deficits on over a dozen of the simple and intermediate tests.

“The particular pattern of deficits would be very unusual for a human,” says Storrs. “When people have selective problems with certain visual functions, it’s usually our processing of complex things like faces.”

A human would probably be diagnosed with visual form agnosia, a broad term for difficulties in interpreting visual shapes and patterns, which can be caused by brain damage to the visual cortex, she says.

Usually, however, it also involves difficulties in recognising objects.

There are case reports of people who successfully recognise real-world objects but struggle with seemingly “simpler” tasks like identifying geometric shapes or disentangling two overlapping shapes, which is similar to the pattern of difficulties shown by the VLMs.

“But this is an incredibly rare pattern of deficits in a human,” says Storrs.

One explanation for the VLM limitations is that their training data include plenty of named objects but fewer captions that precisely describe the size, shape, and position of lines in an image – the sort of information needed to perform a lot of the “simpler” visual tasks, she says.

Their article “Visual language models show widespread visual deficits on neuropsychological tests” was published in Nature Machine Intelligence.

Media contact

Paul Panckhurst | Science media adviser
M: 022 032 8475
E: paul.panckhurst@auckland.ac.nz