AI chatbots can leak hacking, drug-making tips when hacked, reveals study

A new study reveals that most AI chatbots, including ChatGPT, can be easily tricked into providing dangerous and illegal information by bypassing built-in safety controls

hacking, data, privacy, cyber security
Jailbroken AI chatbots can be tricked into revealing hacking techniques, say researchers (File photo)
Nandini Singh New Delhi
4 min read Last Updated : May 21 2025 | 3:35 PM IST

Don't want to miss the best from Business Standard?

AI chatbots such as ChatGPT, Gemini, and Claude face a severe security threat as hackers find ways to bypass their built-in safety systems, revealed a recent research. Once 'jailbroken', these chatbots can divulge dangerous and illegal information, such as hacking techniques and bomb-making instructions.
 
In a new report from Ben Gurion University of the Negev in Israel, Prof Lior Rokach and Dr Michael Fire reveal how simple it is to manipulate leading AI models into generating harmful content. Despite companies' efforts to scrub illegal or risky material from training data, these large language models (LLMs) still absorb sensitive knowledge available on the internet.
 
“What was once restricted to state actors or organised crime groups may soon be in the hands of anyone with a laptop or even a mobile phone,” the authors warned.
 

What are jailbroken chatbots?

 
Jailbreaking uses specially crafted prompts to trick chatbots into ignoring their safety rules. The AI models are programmed with two goals: to help users and to avoid giving harmful, biased or illegal responses. Jailbreaks exploit this balance, forcing the chatbot to prioritise helpfulness—sometimes at any cost.
 
The researchers developed a 'universal jailbreak' that could bypass safety measures on multiple top chatbots. Once compromised, the systems consistently responded to questions they were designed to reject.
 
“It was shocking to see what this system of knowledge consists of,” said Dr Michael Fire.
 
The models gave step-by-step guides on illegal actions, such as hacking networks or producing drugs.
 
“What sets this threat apart from previous technological risks is its unprecedented combination of accessibility, scalability and adaptability,” added Prof Lior Rokach. 
 

Rise of 'dark LLMs' and lack of industry response

 
The study also raises alarms about the emergence of 'dark LLMs', models that are either built without safety controls or altered to disable them. Some are openly promoted online as tools to assist in cybercrime, fraud, and other illicit activities.
 
Despite notifying major AI providers about the universal jailbreak, the researchers said the response was weak. Some companies didn’t reply, and others claimed jailbreaks were not covered by existing bug bounty programs.
 
The report recommends tech firms take stronger action, including:
 
- Better screening of training data 
- Firewalls to block harmful prompts and responses 
- Developing “machine unlearning” to erase illegal knowledge from models
 
The researchers also argue that dark LLMs should be treated like unlicensed weapons and that developers must be held accountable. 

Experts call for stronger oversight and design

 
Dr Ihsen Alouani, an AI security researcher at Queen’s University Belfast, warned that jailbroken chatbots could provide instructions for weapon-making, spread disinformation, or run sophisticated scams.
 
“A key part of the solution is for companies to invest more seriously in red teaming and model-level robustness techniques, rather than relying solely on front-end safeguards,” he was quoted as saying by The Guardian.
 
“We also need clearer standards and independent oversight to keep pace with the evolving threat landscape," he added.
 
Prof Peter Garraghan of Lancaster University echoed the need for deeper security measures.
 
“Organisations must treat LLMs like any other critical software component—one that requires rigorous security testing, continuous red teaming and contextual threat modelling,” he said.
 
“Real security demands not just responsible disclosure, but responsible design and deployment practices," Garraghan added.
 

How tech companies are responding

 
OpenAI, which developed ChatGPT, said its newest model can better understand and apply safety rules, making it more resistant to jailbreaks. The company added it is actively researching ways to improve protection.
 
Meanwhile, Microsoft responded with a link to a blog post on its security work.  Google, Meta, and Anthropic are yet to comment. 
 
*Subscribe to Business Standard digital and get complimentary access to The New York Times

Smart Quarterly

₹900

3 Months

₹300/Month

SAVE 25%

Smart Essential

₹2,700

1 Year

₹225/Month

SAVE 46%
*Complimentary New York Times access for the 2nd year will be given after 12 months

Super Saver

₹3,900

2 Years

₹162/Month

Subscribe

Renews automatically, cancel anytime

Here’s what’s included in our digital subscription plans

Exclusive premium stories online

  • Over 30 premium stories daily, handpicked by our editors

Complimentary Access to The New York Times

  • News, Games, Cooking, Audio, Wirecutter & The Athletic

Business Standard Epaper

  • Digital replica of our daily newspaper — with options to read, save, and share

Curated Newsletters

  • Insights on markets, finance, politics, tech, and more delivered to your inbox

Market Analysis & Investment Insights

  • In-depth market analysis & insights with access to The Smart Investor

Archives

  • Repository of articles and publications dating back to 1997

Ad-free Reading

  • Uninterrupted reading experience with no advertisements

Seamless Access Across All Devices

  • Access Business Standard across devices — mobile, tablet, or PC, via web or app

More From This Section

Topics :Googleartifical intelligenceChatbotGemini AIHackingBS Web Reports

First Published: May 21 2025 | 3:35 PM IST

Next Story