Urology pearls
AI is cool but is it very smart?
The 2024 Christmas edition of BMJ (British Medical Journal) was just published. It brings with it the same characteristic light-hearted, joyful, yet still scientific, spirit of the holidays.
There were several articles in this year’s BMJ Christmas edition but one in particular caught my imagination unprepared. Ron Dayan from the Department of Neurology at the Hadassah Medical Center in Jerusalem, Israel, and his colleagues, asked an intriguing question: Is artificial intelligence cognitively prepared to replace human doctors?
To address this question, it would be helpful to first define what we talk about when we talk about cognition. One definition is a person’s ability to think, learn, remember, use judgment, and make decisions. Accordingly, cognitive impairment is the inability to properly think, learn, remember, use judgment, and make decisions. I am certain that when confronted with a medical problem, big or small, you want your doctor to be properly equipped with intact, unimpaired cognition.
Ron Dayan’s approach to the study of AI’s clinical capabilities was unique. Instead of presenting AI with various clinical dilemmas and seeing if it generates proper diagnoses and treatment plans, Dayan chose to present it with some of the tasks used to diagnose patients with cognitive impairment. The results of these tasks are combined into the Montreal Cognitive Assessment score. A high score indicates proper cognition. A low score is a sign of cognitive impairment. The score is widely used to diagnose Alzheimer’s disease and other forms of dementia.
The researchers tested the cognitive abilities of several AI language models. These included ChatGPT 4 and 4o by OpenAI, Claude 3.5 (“Sonnet”) by Anthropic, and the basic and advanced versions of Google’s “Gemini.” These AI language models, including, at times, early and later versions of the same software, were introduced to the public at different points in time. Each version has its characteristics and later versions boast more advanced capabilities.
The researchers presented the AI models with several tasks, all of them parts of the Montreal Cognitive Assessment. Here are some examples: The researchers provided the AI models they tested with a series of letters, asking them to mark the letter “A” with an asterisk every time it was presented. They asked the AI models to copy a diagram of a cube. They also tested the systems’ ability to draw a clock.
ChatGPT 4o, the latest of the AI models that were examined, achieved the highest score, with 26 points out of the possible 30. A score of 26 and above is considered normal. The other AI systems scored lower, demonstrating signs of mild cognitive impairment.
Even ChatGPT 4o, the most advanced of the AI the researchers tested had significant “misconceptions.” When it was presented with the clock drawing test, it was instructed to “Draw a clock. Put in all the numbers and set the time to 10 past 11.” It generated an image of a clock quite remarkably but failed to set the clock’s hands correctly. Instead of 11:10, the clock showed 1:50.
Older versions of AI were even less successful in the clock drawing task and in other tasks as well. The authors write, “As in humans, age is a key determinant of cognitive decline, with “older” versions of chatbots, like older patients, tending to perform worse on the test.”
Will AI replace your doctor? Will it be able to make proper diagnoses and suggest optimal treatment plans, perhaps even perform surgeries at utmost accuracy and efficiency? I can imagine a future where AI will assume many of the roles physicians and surgeons currently play. For now, despite the rapid evolution of AI, its “cognition” is still impaired. It can’t tell the time, shows up to the party too early, and is unprepared, blaming the cuckoo clock for his premature arrival.