OpenAI's Operator AI agent can book tickets, order groceries: How it works

Powered by Computer-Using Agent (CUA) model, OpenAI's Operator can fill out forms, order groceries, book tickets by interacting with webpages on its own

Bs_logoOpenAI Operator
OpenAI Operator (Picture: OpenAI)
Harsh Shivam New Delhi
3 min read Last Updated : Jan 24 2025 | 12:59 PM IST
Microsoft-backed OpenAI has launched a web-based artificial intelligence (AI) agent called "Operator." Currently a research preview, the new AI agent can perform web-based tasks without user intervention. OpenAI said that Operator uses a web browser and can interact with webpages on its own by typing, clicking, and scrolling.
 
"Operator is one of our first agents, which are AIs capable of doing work for you independently—you give it a task and it will execute it," said OpenAI in a blog post.
 
What are AI agents?
 
AI agents are advanced software systems developed to perform complex, multi-step tasks with minimal user intervention. These systems operate independently, streamlining repetitive processes that typically require manual handling.
 
In addition to natural language processing, AI agents can analyse data, make decisions, and interact with their surroundings to complete assigned tasks. They utilise various inputs, such as text, images, and audio, to gather information and autonomously achieve objectives based on predefined goals. While users define the desired outcome, the agents identify and execute the most effective approach to reach it.
OpenAI Operator: What is it
 
Operator is an AI agent designed to analyse and interact with webpages. It can perform various repetitive browser tasks, such as filling out forms, ordering groceries, or even creative activities like making memes, using actions like typing, clicking, and scrolling.
 
OpenAI Operator: How it works
 
The Operator is powered by a new AI model called the Computer-Using Agent (CUA). This model integrates GPT-4's vision capabilities with advanced reasoning to interact with graphical user interfaces (GUIs) on webpages, including buttons, menus, and text fields.
 
The AI agent functions by capturing screenshots to analyse the content on the screen and interacting with it using actions similar to those performed with a mouse and keyboard. It operates within a web browser, eliminating the need for custom API (Application Programming Interface) integrations.
 
When the Operator encounters errors, it uses its reasoning abilities to self-correct. If it cannot resolve an issue, it returns control to the user.
 
OpenAI Operator: How to use
 
Users can instruct the Operator by describing the task they want it to complete. Additionally, custom instructions can be provided for specific sites, such as setting airline preferences for booking flights. The Operator can also automate multiple tasks simultaneously.
 
For tasks requiring login credentials, payment details, or CAPTCHA solving, the Operator prompts the user to take over. Additionally, users can regain control of the remote browser at any time.
OpenAI Operator: Availability
 
Operator is currently available as a research preview for Pro-tier subscribers in the United States via a dedicated webpage. OpenAI plans to extend access to Plus, Team, and Enterprise subscribers in the future. Once testing concludes, the Operator will be integrated into ChatGPT.

More From This Section

Topics :OpenAIChatGPTAI Models

First Published: Jan 24 2025 | 12:54 PM IST

Next Story