In a first such warning when it comes to the role of Artificial Intelligence in making sense of critical health data, a team of US researchers has said AI in the medical space must be carefully tested for performance across a wide range of populations as the deep learning models may fall short.
The findings should give pause to those considering rapid deployment of AI platforms without rigorously assessing their performance in real-world clinical settings reflective of where they are being deployed, observed the team from the Icahn School of Medicine at Mount Sinai School of Medicine.
AI tools trained to detect pneumonia on chest X-rays suffered significant decreases in performance when tested on data from outside health systems, according to the study published in a special issue of PLOS Medicine on machine learning and health care.
These findings suggest that the deep learning models may not perform as accurately as expected.
"Deep learning models trained to perform medical diagnosis can generalise well, but this cannot be taken for granted since patient populations and imaging techniques differ significantly across institutions," said Senior Author Eric Oermann, MD, Instructor in Neurosurgery at the Icahn School of Medicine at Mount Sinai.
To reach this conslusion, the researchers assessed how AI models identified pneumonia in 158,000 chest X-rays across three medical institutions -- the National Institutes of Health, The Mount Sinai Hospital and Indiana University Hospital.
In three out of five comparisons, the convolutional neural networks' (CNNs) performance in diagnosing diseases on X-rays from hospitals outside of its own network was significantly lower than on X-rays from the original health system.
However, CNNs were able to detect the hospital system where an X-ray was acquired with a high-degree of accuracy, and cheated at their predictive task based on the prevalence of pneumonia at the training institution.
"If AI systems are to be used for medical diagnosis, they must be tailored to carefully consider clinical questions, tested for a variety of real-world scenarios, and carefully assessed to determine how they impact accurate diagnosis," explained Study's First Author John Zech.
--IANS
na/vm
Disclaimer: No Business Standard Journalist was involved in creation of this content
You’ve reached your limit of {{free_limit}} free articles this month.
Subscribe now for unlimited access.
Already subscribed? Log in
Subscribe to read the full story →
Smart Quarterly
₹900
3 Months
₹300/Month
Smart Essential
₹2,700
1 Year
₹225/Month
Super Saver
₹3,900
2 Years
₹162/Month
Renews automatically, cancel anytime
Here’s what’s included in our digital subscription plans
Exclusive premium stories online
Over 30 premium stories daily, handpicked by our editors


Complimentary Access to The New York Times
News, Games, Cooking, Audio, Wirecutter & The Athletic
Business Standard Epaper
Digital replica of our daily newspaper — with options to read, save, and share


Curated Newsletters
Insights on markets, finance, politics, tech, and more delivered to your inbox
Market Analysis & Investment Insights
In-depth market analysis & insights with access to The Smart Investor


Archives
Repository of articles and publications dating back to 1997
Ad-free Reading
Uninterrupted reading experience with no advertisements


Seamless Access Across All Devices
Access Business Standard across devices — mobile, tablet, or PC, via web or app
