AI tool xFakeSci achieves 94% accuracy in identifying fake research papers

In a set of 300 fake and real scientific papers, the AI-based tool, named 'xFakeSci', detected up to 94 per cent of the fake ones

The artificial intelligence (AI) market in India is expected to clock a compound annual growth rate of 25-35 per cent by 2027, matching a global trend of the technology's expansion. The Indian market is worth $7-10 billion now and it is expected to r
The xFakeSci algorithm was trained on the first dataset containing scientific papers and then tested for its performance on the second one. | Representative Picture
Press Trust of India New Delhi
3 min read Last Updated : Sep 04 2024 | 7:17 PM IST

Researchers have developed a tool that could tell apart an original research article from one created by AI-chatbots, including ChatGPT.

In a set of 300 fake and real scientific papers, the AI-based tool, named 'xFakeSci', detected up to 94 per cent of the fake ones.

This was nearly twice the success rate seen among the more common data-mining techniques, the authors from the State University of New York, US, and Hefei University of Technology, China, said.

"... we introduce xFakeSci, a novel learning algorithm, that is capable of distinguishing ChatGPT-generated articles from publications produced by scientists," they wrote in the study published in the journal Scientific Reports.

For developing the AI-based algorithm, the researchers developed two distinct datasets. One of them contained almost 4,000 scientific articles taken from PubMed, an open database housing biomedical and life sciences research papers and maintained by the US National Institutes of Health.

The other consisted of 300 fake articles, which the researchers created using ChatGPT.

"I tried to use exact same keywords that I used to extract the literature from the PubMed database, so we would have a common basis of comparison. My intuition was that there must be a pattern exhibited in the fake world versus the actual world, but I had no idea what this pattern was," the study's co-author Ahmed Abdeen Hamed, a visiting research fellow at the State University of New York, said.

Of the 300 fake articles, 100 each were related to the medical conditions Alzheimer's disease, cancer, and depression. Each of the 100 included 50 chatbot-created articles and 50 authentic abstracts taken from PubMed.

The xFakeSci algorithm was trained on the first dataset containing scientific papers and then tested for its performance on the second one.

"The xFakeSci algorithm achieved (accuracy) scores ranging from 80 to 94 per cent, outperforming common data mining algorithms, which scored (accuracy) values between 38 and 52 per cent," the authors wrote.

xFakeSci was programmed to analyse two major features in the fake papers, according to the authors.

One was the numbers of bigrams, which are two words commonly appearing together such as 'climate change', 'clinical trials' or 'biomedical literature'. The second was how those bigrams are linked to other words and concepts in the text, they said.

"The first striking thing was that the number of bigrams were very few in the fake world, but in the real world, the bigrams were much more rich. Also, in the fake world, despite the fact that were very few bigrams, they were so connected to everything else," Hamed said.

The authors proposed that the writing style adopted by an AI is different from that of a human researcher because the two do not have the same goals while producing a piece on a given topic.

"Because ChatGPT is still limited in its knowledge, it tries to convince you by using the most significant words," Hamed said.

"It is not the job of a scientist to make a convincing argument to you. A real research paper reports honestly about what happened during an experiment and the method used. ChatGPT is about depth on a single point, while real science is about breadth," Hamed said.


(Only the headline and picture of this report may have been reworked by the Business Standard staff; the rest of the content is auto-generated from a syndicated feed.)

*Subscribe to Business Standard digital and get complimentary access to The New York Times

Smart Quarterly

₹900

3 Months

₹300/Month

SAVE 25%

Smart Essential

₹2,700

1 Year

₹225/Month

SAVE 46%
*Complimentary New York Times access for the 2nd year will be given after 12 months

Super Saver

₹3,900

2 Years

₹162/Month

Subscribe

Renews automatically, cancel anytime

Here’s what’s included in our digital subscription plans

Exclusive premium stories online

  • Over 30 premium stories daily, handpicked by our editors

Complimentary Access to The New York Times

  • News, Games, Cooking, Audio, Wirecutter & The Athletic

Business Standard Epaper

  • Digital replica of our daily newspaper — with options to read, save, and share

Curated Newsletters

  • Insights on markets, finance, politics, tech, and more delivered to your inbox

Market Analysis & Investment Insights

  • In-depth market analysis & insights with access to The Smart Investor

Archives

  • Repository of articles and publications dating back to 1997

Ad-free Reading

  • Uninterrupted reading experience with no advertisements

Seamless Access Across All Devices

  • Access Business Standard across devices — mobile, tablet, or PC, via web or app

More From This Section

Topics :Artificial intelligenceAI ModelsScience

First Published: Sep 04 2024 | 7:17 PM IST

Next Story