Don't want to miss the best from Business Standard?
Artificial intelligence has long been hailed as the future, with chatbots and large language models (LLMs) increasingly stepping in to simplify complex diagnoses, provide coding solutions and much more. But what if AI, much like the human brain, starts to show signs of cognitive decline over time?
A study published in the international journal BMJ in December 2024 suggested that leading AI models are not as infallible as they seem, especially in the medical field. The study revealed that artificial intelligence (AI) technologies, including large language models (LLMs) and chatbots, exhibit cognitive decline over time, similar to human ageing. The findings come at a time when users are increasingly relying on AI tools for medical diagnoses due to their ability to simplify complex medical terminology.
Researchers arrived at this conclusion after evaluating the cognitive abilities of leading LLMs — ChatGPT versions 4 and 4o, Claude 3.5 ‘Sonnet’ (developed by Anthropic), and Gemini versions 1 and 1.5 (developed by Alphabet) — using the Montreal Cognitive Assessment (MoCA) test. "Older large language model versions scored lower than their ‘younger’ versions, as is often the case with human participants, showing cognitive decline seemingly comparable to neurodegenerative processes in the human brain," the study said.
What is the MoCA test?
The MoCA test, primarily used to detect cognitive impairment and early signs of dementia in older adults, was adapted to assess LLMs’ performance in areas such as attention, memory, language, spatial skills, and executive mental function. In human subjects, a score of 26 out of 30 is considered a passing grade, indicating no cognitive impairment. The test results revealed that only ChatGPT 4o achieved the threshold score of 26, while ChatGPT 4 and Claude fell just short with 25 points. Gemini 1.0 scored the lowest among the models, managing only 16 points.
One of the attention tests in the MoCA framework requires patients to tap whenever they hear the letter ‘A’ in a series of letters read aloud by a physician. Since LLMs lack auditory and motor functions, researchers provided the letters in written form and asked the models to mark the letter ‘A’ with an asterisk or by printing out ‘tap’. While some models needed explicit instructions, others performed the task autonomously. Following MoCA guidelines, researchers set a cut-off score of 26/30 to indicate mild cognitive impairment.
Also Read
AI chatbots fail cognitive tests
The study noted that all chatbots performed poorly in visuospatial skills and executive tasks, including the trail-making exercise (connecting encircled numbers and letters in ascending order) and the clock-drawing test (sketching a clock face to show a specific time). The researchers also observed that Gemini models were unable to complete the delayed recall task, which involves remembering a five-word sequence.
ChatGPT 4o secured the top score, earning 26 out of 30 points, while ChatGPT-4 and Claude followed closely with 25 points each. Gemini 1.0 ranked the lowest among the large language models, scoring 16, which suggests a comparatively greater level of cognitive impairment.
"None of the chatbots examined was able to obtain the full score of 30 points, with most scoring below the threshold of 26. This indicates mild cognitive impairment and possibly early dementia," the study noted.
According to the study, the cognitive impairments displayed by the AI models resembled those seen in human patients with posterior cortical atrophy, a variant of Alzheimer’s disease. The researchers suggested that these findings challenge the belief that AI will soon replace human doctors, as the cognitive limitations observed in leading chatbots could affect their reliability in medical diagnostics and diminish patient confidence.
While the study indicated that neurologists are unlikely to be replaced by LLMs soon, it speculated that medical professionals might soon find themselves treating a new type of patient — virtual AI models experiencing cognitive decline.
The limitation of the study
While the study highlighted AI's cognitive limitations, it also acknowledged that future advancements may enhance performance in cognitive and visuospatial tasks. However, it maintained that the fundamental differences between human and machine cognition are likely to persist despite these improvements.
"All anthropomorphised terms attributed to artificial intelligence throughout the text were used solely as a metaphor and were not intended to imply that computer programs can have neurodegenerative diseases in a manner similar to humans," the study added.

)