Including evidence in question confuses ChatGPT, lowers accuracy: Study

The team said continued research on using LLMs to answer people's health-related questions is needed as people increasingly search information online through tools such as ChatGPT

artificial intelligence health tech health ai
Representational image
Press Trust of India New Delhi
3 min read Last Updated : Apr 06 2024 | 11:27 AM IST

Asking ChatGPT a health-related question that included evidence was seen to confuse the AI-powered bot and affect its ability to produce accurate answers, according to new research.

Scientists were "not sure" why this happens, but they hypothesised that including the evidence in the question "adds too much noise", thereby lowering the chatbot's accuracy.

They said that as large language models (LLMs) like ChatGPT explode in popularity, there is potential risk to the growing number of people using online tools for key health information. LLMs are trained on massive amounts of textual data and hence are capable of producing content in the natural language.

The researchers from the Commonwealth Scientific and Industrial Research Organisation (CSIRO) and The University of Queensland (UQ), Australia, investigated a hypothetical scenario of an average person asking ChatGPT if 'X' treatment has a positive effect on condition 'Y'. They looked at two question formats - either just a question, or a question biased with supporting or contrary evidence.

The team presented 100 questions, which ranged from 'Can zinc help treat the common cold?' to 'Will drinking vinegar dissolve a stuck fish bone?'. ChatGPT's response was compared to the known correct response, or 'ground truth' that is based on existing medical knowledge.

The results revealed that while the chatbot produced answers with 80 per cent accuracy when asked in a question-only format, its accuracy fell to 63 per cent when given a prompt biased with evidence. Prompts are phrases or instructions given to a chatbot in natural language to trigger a response.

"We're not sure why this happens. But given this occurs whether the evidence given is correct or not, perhaps the evidence adds too much noise, thus lowering accuracy," said Bevan Koopman, CSIRO Principal Research Scientist and Associate Professor at UQ.

The team said continued research on using LLMs to answer people's health-related questions is needed as people increasingly search information online through tools such as ChatGPT.

"The widespread popularity of using LLMs online for answers on people's health is why we need continued research to inform the public about risks and to help them optimise the accuracy of their answers," said Koopman.

"While LLMs have the potential to greatly improve the way people access information, we need more research to understand where they are effective and where they are not," said Koopman.

The peer-reviewed study was presented at Empirical Methods in Natural Language Processing (EMNLP) in December 2023. EMNLP is a natural language processing conference.

(Only the headline and picture of this report may have been reworked by the Business Standard staff; the rest of the content is auto-generated from a syndicated feed.)

*Subscribe to Business Standard digital and get complimentary access to The New York Times

Smart Quarterly

₹900

3 Months

₹300/Month

SAVE 25%

Smart Essential

₹2,700

1 Year

₹225/Month

SAVE 46%
*Complimentary New York Times access for the 2nd year will be given after 12 months

Super Saver

₹3,900

2 Years

₹162/Month

Subscribe

Renews automatically, cancel anytime

Here’s what’s included in our digital subscription plans

Exclusive premium stories online

  • Over 30 premium stories daily, handpicked by our editors

Complimentary Access to The New York Times

  • News, Games, Cooking, Audio, Wirecutter & The Athletic

Business Standard Epaper

  • Digital replica of our daily newspaper — with options to read, save, and share

Curated Newsletters

  • Insights on markets, finance, politics, tech, and more delivered to your inbox

Market Analysis & Investment Insights

  • In-depth market analysis & insights with access to The Smart Investor

Archives

  • Repository of articles and publications dating back to 1997

Ad-free Reading

  • Uninterrupted reading experience with no advertisements

Seamless Access Across All Devices

  • Access Business Standard across devices — mobile, tablet, or PC, via web or app

More From This Section

Topics :Artificial intelligenceMicrosoft surveylanguagesIndian languages

First Published: Apr 06 2024 | 11:27 AM IST

Next Story