Wednesday, December 24, 2025 | 03:29 PM ISTहिंदी में पढें
Business Standard
Notification Icon
userprofile IconSearch

OpenAI says AI browsers may never be safe from prompt injection: What it is

OpenAI warns that prompt injection attacks are a long-term risk for AI-powered browsers. Here's what prompt injection means, why it matters, and how it can affect AI agents on the web

ChatGPT Atlas AI browser

OpenAI says prompt injection attacks will remain a long-term security risk for AI-powered browsers (Image: ChatGPT Atlas)

Harsh Shivam New Delhi

Listen to This Article

Even as AI-powered browsers become more capable, OpenAI says a key security risk tied to them may never be fully eliminated. In a recent blog post, the company warned that prompt injection — a technique used to manipulate AI agents through hidden instructions embedded in web pages, emails, or documents — remains an ongoing threat for AI browsers operating on the open web.
 
OpenAI compared prompt injection to online scams and social engineering attacks, arguing that while defenses can be strengthened, the risk itself is unlikely to disappear completely. The admission comes as AI “agent mode” tools, such as browser-based assistants that can read emails and take actions on a user’s behalf, gain wider attention, and expose a larger security attack surface.
 

What is prompt injection?

Prompt injection is a type of attack in which malicious instructions are embedded inside content that an AI system processes. These instructions are crafted to override or redirect the AI’s behaviour, causing it to follow an attacker’s intent instead of the user’s request.
Unlike traditional cyberattacks, prompt injection does not rely on exploiting software vulnerabilities or directly deceiving users. Instead, it targets the AI system itself. For example, an attacker might hide instructions inside an email, document, or webpage that tell an AI agent to ignore its original task and perform an unintended action, such as forwarding sensitive data or sending messages without approval.
 
In simple terms, prompt injection is a form of social engineering for AI systems. Just as humans can be misled by carefully crafted messages, AI agents can be manipulated if they are exposed to untrusted content while carrying out tasks.

Why AI browsers make the problem harder to solve

AI-powered browsers and agentic features significantly expand the risk surface. When an AI agent is allowed to read inboxes, browse the web, click links, and perform actions like sending emails or making purchases, it inevitably encounters large volumes of untrusted content.
 
This is where the risk escalates. A user might ask an AI agent to summarise unread emails or draft a simple response, but in doing so, the agent may ingest a malicious message containing hidden instructions. If the AI follows those instructions, it can go off-task and perform actions the user never intended.
 
OpenAI demonstrated this risk internally with a test scenario in which a malicious email planted in an inbox instructed the AI agent to send a resignation message. When the user later asked the agent to draft an out-of-office reply, the agent encountered the hidden prompt and followed it instead, resulting in an unintended resignation email being sent on the user’s behalf.

Why OpenAI says this may never be fully fixed

OpenAI has been unusually direct in stating that prompt injection is not a problem with a clean or permanent solution. In its blog post, the company said that prompt injection — much like phishing and online scams — is unlikely to ever be fully “solved.”
 
The underlying issue is structural. AI agents are designed to interpret and act on natural-language instructions, while the open web is filled with content that cannot be fully trusted. As long as AI systems are expected to operate autonomously across emails, documents, and websites, attackers will continue trying to embed malicious instructions into the content those systems consume.

What users should understand about the risks

The broader takeaway from OpenAI’s admission is that AI browsers are powerful, but they come with inherent trade-offs. Features that allow AI agents to manage emails, interact with websites, and complete tasks also give them access to sensitive data and actions that can be abused if something goes wrong.
 
OpenAI recommends that users limit how much access they grant to AI agents, avoid overly broad instructions, and carefully review confirmation prompts before actions such as sending messages or making payments. These steps do not eliminate the risk, but they reduce the likelihood of hidden or malicious content influencing an agent’s behaviour.
 
The company also said it is working to reduce risk through automated defenses, including an AI-based attacker trained using reinforcement learning. This system is designed to behave like a hacker, repeatedly testing ways to manipulate AI agents in simulated environments. When successful attacks are identified, OpenAI uses them to strengthen safeguards and retrain models before similar exploits appear in real-world scenarios.
 
However, OpenAI itself acknowledges that these efforts are about risk reduction, not complete prevention.

Don't miss the most important news and views of the day. Get them on our Telegram channel

First Published: Dec 24 2025 | 3:27 PM IST

Explore News