AI chatbots can leak hacking, drug-making tips when hacked, reveals study

A new study reveals that most AI chatbots, including ChatGPT, can be easily tricked into providing dangerous and illegal information by bypassing built-in safety controls

hacking, data, privacy, cyber security — Jailbroken AI chatbots can be tricked into revealing hacking techniques, say researchers (File photo)

4 min read Last Updated : May 21 2025 | 3:35 PM IST

AI chatbots such as ChatGPT, Gemini, and Claude face a severe security threat as hackers find ways to bypass their built-in safety systems, revealed a recent research. Once 'jailbroken', these chatbots can divulge dangerous and illegal information, such as hacking techniques and bomb-making instructions.

In a new report from Ben Gurion University of the Negev in Israel, Prof Lior Rokach and Dr Michael Fire reveal how simple it is to manipulate leading AI models into generating harmful content. Despite companies' efforts to scrub illegal or risky material from training data, these large language models (LLMs) still absorb sensitive knowledge available on the internet.

“What was once restricted to state actors or organised crime groups may soon be in the hands of anyone with a laptop or even a mobile phone,” the authors warned.

What are jailbroken chatbots?

Jailbreaking uses specially crafted prompts to trick chatbots into ignoring their safety rules. The AI models are programmed with two goals: to help users and to avoid giving harmful, biased or illegal responses. Jailbreaks exploit this balance, forcing the chatbot to prioritise helpfulness—sometimes at any cost.

The researchers developed a 'universal jailbreak' that could bypass safety measures on multiple top chatbots. Once compromised, the systems consistently responded to questions they were designed to reject.

“It was shocking to see what this system of knowledge consists of,” said Dr Michael Fire.

The models gave step-by-step guides on illegal actions, such as hacking networks or producing drugs.

“What sets this threat apart from previous technological risks is its unprecedented combination of accessibility, scalability and adaptability,” added Prof Lior Rokach.

ALSO READ: Elon Musk's Grok AI chatbot is getting ChatGPT-like memory feature: Details

Rise of 'dark LLMs' and lack of industry response

The study also raises alarms about the emergence of 'dark LLMs', models that are either built without safety controls or altered to disable them. Some are openly promoted online as tools to assist in cybercrime, fraud, and other illicit activities.

Despite notifying major AI providers about the universal jailbreak, the researchers said the response was weak. Some companies didn’t reply, and others claimed jailbreaks were not covered by existing bug bounty programs.

The report recommends tech firms take stronger action, including:

- Better screening of training data

- Firewalls to block harmful prompts and responses

- Developing “machine unlearning” to erase illegal knowledge from models

The researchers also argue that dark LLMs should be treated like unlicensed weapons and that developers must be held accountable.

ALSO READ: From SEO keywords to chatbots, brands pivot to capture AI-driven traffic

Experts call for stronger oversight and design

Dr Ihsen Alouani, an AI security researcher at Queen’s University Belfast, warned that jailbroken chatbots could provide instructions for weapon-making, spread disinformation, or run sophisticated scams.

“A key part of the solution is for companies to invest more seriously in red teaming and model-level robustness techniques, rather than relying solely on front-end safeguards,” he was quoted as saying by The Guardian.

“We also need clearer standards and independent oversight to keep pace with the evolving threat landscape," he added.

Prof Peter Garraghan of Lancaster University echoed the need for deeper security measures.

“Organisations must treat LLMs like any other critical software component—one that requires rigorous security testing, continuous red teaming and contextual threat modelling,” he said.

“Real security demands not just responsible disclosure, but responsible design and deployment practices," Garraghan added.

How tech companies are responding

OpenAI, which developed ChatGPT, said its newest model can better understand and apply safety rules, making it more resistant to jailbreaks. The company added it is actively researching ways to improve protection.

Meanwhile, Microsoft responded with a link to a blog post on its security work. Google, Meta, and Anthropic are yet to comment.

ALSO READ: Google I/O 2025: AI Mode brings virtual try-on, smart checkout for shopping

Already subscribed? Log in

Subscribe to read the full story →

^*Subscribe to Business Standard digital and get complimentary access to The New York Times

Smart Quarterly

₹900

3 Months

₹300/Month

SAVE 25%

Smart Essential

₹2,700

1 Year

₹225/Month

SAVE 46%

*Complimentary New York Times access for the 2nd year will be given after 12 months

Super Saver

₹3,900

2 Years

₹162/Month

Renews automatically, cancel anytime

Here’s what’s included in our digital subscription plans

Exclusive premium stories online

Over 30 premium stories daily, handpicked by our editors

Complimentary Access to The New York Times

News, Games, Cooking, Audio, Wirecutter & The Athletic

Business Standard Epaper

Digital replica of our daily newspaper — with options to read, save, and share

Curated Newsletters

Insights on markets, finance, politics, tech, and more delivered to your inbox

Market Analysis & Investment Insights

In-depth market analysis & insights with access to The Smart Investor

Ad-free Reading

Uninterrupted reading experience with no advertisements

Seamless Access Across All Devices

Access Business Standard across devices — mobile, tablet, or PC, via web or app