Home / Health / AI analysis shows fake studies may be flooding cancer research worldwide
AI analysis shows fake studies may be flooding cancer research worldwide
A machine learning study has flagged more than 260,000 cancer research papers worldwide as potentially linked to paper mills, raising serious concerns about the reliability and integrity of cancer sci
Nearly 1 in 10 cancer papers flagged as fake in major AI-led study. (Photo: AdobeStock)
4 min read Last Updated : Feb 09 2026 | 11:23 AM IST
What if some of the cancer research we trust, cite, and build treatments on was never real to begin with?
According to a new large-scale study, that troubling possibility may be far more widespread in cancer science than previously believed.
Published in the medical journal The BMJ, the study titled Machine learning-based screening of potential paper mill publications in cancer research: methodological and cross-sectional study analysed 2.6 million cancer research papers published between 1999 and 2024. Using a machine learning tool, researchers found that nearly ten per cent of these papers show strong textual similarities to studies known to be produced by so-called “paper mills”, which are the organisations that sell fabricated or low-quality research for profit. The findings suggest the problem is not marginal and may be affecting the evidence base of cancer science.
What are ‘paper mills’ in scientific research?
Paper mills are commercial operations that manufacture scientific papers on demand. For a price, they can offer researchers anything from authorship slots to fully written manuscripts, often complete with fabricated data, images and references.
To operate at scale, these organisations rely on templates, recycling sentence structures, phrases and study designs while swapping in different genes, proteins or cancer types. The result can appear legitimate at first glance but may be fundamentally unreliable.
How did the AI study identify suspicious cancer papers?
According to the study, the researchers set out to test whether machine learning could reliably detect paper-mill products at scale and, if so, how widespread the problem is in cancer research.
Instead of analysing images or raw data, which are difficult to access across millions of papers, the team examined whether language patterns alone could reveal misconduct.
They trained a language model called BERT, a widely used artificial intelligence system for text analysis, on more than 2,000 cancer papers that had already been retracted for paper-mill involvement.
Crucially, the model analysed only titles and abstracts, the most visible parts of a paper. The idea was that template-driven writing leaves behind linguistic “fingerprints”. When tested, the model correctly identified suspicious papers around 91 per cent of the time, functioning much like a scientific spam filter.
How widespread are suspected fake papers in cancer research?
When applied to the full cancer research corpus, the model flagged 261,245 papers, about 9.87 per cent of all original cancer research articles, as having strong similarities to known paper-mill publications.
The analysis showed a sharp rise over time. In the early 2000s, flagged papers accounted for roughly one per cent of cancer research. By 2022, that figure had climbed to over 16 per cent, indicating rapid growth over the past two decades.
Are top cancer journals affected too?
The researchers found that the problem is not limited to smaller or obscure journals. By 2022, more than ten per cent of papers published in top cancer journals were flagged.
This challenges the assumption that prestige alone guarantees reliability and suggests paper-mill content may be slipping through even rigorous peer-review systems.
Which areas of cancer research are most affected?
The study found higher concentrations of flagged papers in:
By contrast, areas such as cancer survivorship, population studies, health systems and policy research showed much lower levels of suspicious publications.
The authors stress that being flagged is not proof of misconduct. It simply indicates that a paper’s language closely resembles known paper-mill products and warrants closer human scrutiny. Some papers may prove genuine, while others may reflect different forms of error or misconduct.
The tool is designed to support editors and reviewers, not replace them, the study emphasises. According to the authors, the machine learning system is already being piloted by several journals to screen submissions before peer review. The researchers plan to refine the model, expand it to other scientific fields, and update it continuously as new paper-mill cases are confirmed. For more health updates, follow #HealthWithBS