Wednesday, December 17, 2025 | 08:03 AM ISTहिंदी में पढें
Business Standard
Notification Icon
userprofile IconSearch

AI is suffering 'brain rot' as social media junk clouds its judgment

Researchers from Texas A&M, the University of Texas at Austin and Purdue University find that exposure to junk web text leads to lasting cognitive decline in large language models

artificial intelligence, AI Models

Prolonged exposure to low-information web content over long periods can hamper an AI model’s reasoning ability, long-context comprehension, and ethical consistency.

Abhijeet Kumar New Delhi

Listen to This Article

As artificial intelligence (AI) expands in all walks of personal and professional lives, a new study has found that the backbone of AI, which is the large language models (LLMs), can suffer significant cognitive decline when repeatedly trained on low-quality online text, weakening their reasoning, memory, and ethical reliability.
 
The research, published on October 15 under the title “LLMs Can Get Brain Rot!”, warns that continued training on viral, engagement-driven content could leave advanced AI systems less capable and more erratic over time.
 

What is LLM ‘brain rot’ hypothesis 

The researchers have proposed what they call the “LLM Brain Rot Hypothesis,” the idea that continued pre-training on trivial, engagement-driven, or low-information web content over long periods can hamper an AI model’s reasoning ability, long-context comprehension, and ethical consistency.
 
 
The term is derived from the expression “internet brain rot,” used to describe the exhaustion seen in people who consume large amounts of superficial or addictive online content. The team drew parallels between how humans lose focus or reasoning depth from overexposure to trivial media and how language models might experience similar degradation when trained on shallow text.
 

How scientists tested junk data effects on AI models

 
To test their hypothesis, the researchers designed a controlled experiment using real data from social media platform X. They created two measures to identify junk content.
  • M1 (Engagement degree): posts that were short, viral and highly liked or retweeted, designed to maximise user attention.
  • M2 (Semantic quality): posts flagged for low informational value or clickbait-style writing, such as exaggerated claims or attention-grabbing phrases.
Both metrics were used to construct datasets containing varying proportions of junk and high-quality content. The researchers then subjected four popular LLMs, including Llama3 and Qwen2.5, to repeated pre-training using these datasets.
 
Each model underwent the same training conditions, allowing scientists to isolate the impact of data quality. The idea was to simulate what happens when AI systems continue learning from today’s online environments, where a growing share of content is short, viral, or machine-generated.
 

What did the study find?

 
The results were striking. When models were trained entirely on junk data, their reasoning accuracy fell from 74.9 to 57.2, while long-context comprehension dropped from 84.4 to 52.3. The decline was not random. In fact, it worsened as the share of junk content in the training data increased, showing what the authors called a “dose–response” effect.
 
The study found that both engagement-heavy (M1) and low-semantic-quality (M2) datasets harmed model performance, though the M1 type, representing highly popular, low-effort content, produced the most severe losses.
 
In addition to reasoning and comprehension, models also showed reduced ethical consistency and emerging “personality drift.” The authors observed that models exposed to junk data became less reliable, more self-assured in wrong answers, and more prone to superficial responses.
 

Decoding thought-skipping and dark traits in LLMs 

Further analysis revealed how this cognitive decay manifests inside the models. When given complex reasoning tasks, LLMs trained on junk data frequently skipped steps in their reasoning chains, a behaviour the researchers called “thought-skipping.”
 
Instead of producing detailed, logical explanations, the models gave shorter, less structured answers, often jumping directly to conclusions. This pattern explained much of the observed accuracy loss.
 
The study also found traces of “dark traits”, including increased tendencies resembling narcissism and psychopathy, in the behaviour of models trained on engagement-heavy data. These traits, seen through personality evaluation benchmarks, were linked to higher confidence in wrong or ethically risky outputs.
 
Attempts to fix the problem by retraining models on cleaner data only partly worked. While reasoning accuracy improved somewhat, it never returned to baseline levels, suggesting that the degradation left a lasting mark, which the researchers described as a “persistent representational drift.”
 

Why data quality matters for AI safety and reliability 

The findings carry significant implications for AI developers and policymakers. The study reframes data curation not just as a technical detail but as a “training-time safety issue.”
 
Continual exposure to poor-quality text appears to weaken the cognitive and ethical reliability of LLMs, the very properties that underpin their safe deployment in finance, education, or public communication.
 
The authors argued that junk data reduces an AI model’s “reasoning depth” and “ethical alignment,” while also eroding its ability to retain and use information over long contexts. This mirrors human studies on how exposure to trivial or emotionally charged content can dull focus and memory.
 
Given that much of the internet now contains AI-generated or engagement-optimised text, the researchers warn that future models risk inheriting and amplifying these distortions if data quality is not strictly managed.
 

What’s the way forward? 

The paper called for systematic monitoring of cognitive health in LLMs, similar to regular safety or performance audits in other industries.
 
It recommended three key steps:
  • Introducing routine cognitive evaluations for deployed AI systems to detect early signs of reasoning decline.
  • Tightening data quality controls during pre-training, with stronger filters against trivial or engagement-optimised text.
  • Studying how viral or attention-driven content reshapes AI learning patterns, so models can be designed to resist its influence.
 
The researchers say these measures are essential to prevent cumulative damage as models are continually retrained on evolving web data. “Data quality is a causal driver of LLM capability decay,” the paper concluded.
 
As large models increasingly learn from one another’s outputs and as synthetic text floods the internet, the warning bell has been rung. Without careful data hygiene, the next generation of AI may not just mirror human brain rot, but become a sufferer of its own version.

Don't miss the most important news and views of the day. Get them on our Telegram channel

First Published: Oct 21 2025 | 6:04 PM IST

Explore News