Thursday, December 04, 2025 | 04:03 AM ISTहिंदी में पढें
Business Standard
Notification Icon
userprofile IconSearch

Google tests Gemini 2.5 Computer Use: How's it different from other models

Google's new Gemini 2.5 Computer Use AI model can interact with apps and websites by clicking, typing, and scrolling like a human. Here's how it works

Gemini 2.5 Computer Use AI model

Gemini 2.5 Computer Use AI model

Harsh Shivam New Delhi

Listen to This Article

Don't want to miss the best from Business Standard?

Google has previewed a new AI model called “Gemini 2.5 Computer Use.” According to the company, it is designed to help AI agents directly interact with websites and apps through their graphical user interfaces (GUIs). Built on top of Gemini 2.5 Pro, this specialised model gives AI systems the ability to perform digital tasks that typically require human-like actions such as clicking buttons, filling out forms, or navigating between pages.
 
According to Google, the model outperforms other AI systems on multiple web and mobile control benchmarks while operating with lower latency. It is now available in public preview through the Gemini API in Google AI Studio and Vertex AI.
 

How is it different from other models

Traditional AI models often interact with software through structured APIs which are essentially predefined connections that let one program talk to another. But many real-world digital tasks still rely on interfaces meant for humans, such as booking appointments, submitting forms, or browsing online dashboards. That’s where Gemini 2.5 Computer Use comes in. 

How it works

At its core, the Gemini 2.5 Computer Use model operates in a loop. It receives three inputs: the user’s request, a screenshot of the app or webpage it’s viewing, and a record of its recent actions.
 
Using this information, the model analyses what it sees on screen and decides on the next step such as clicking a button, typing into a field, or scrolling down a page. In some cases, it can even ask the user for confirmation before carrying out actions like making a purchase.
 
Once an action is completed, the system takes a new screenshot, updates its context, and continues the process until the task is done or the user stops it. For now, it’s primarily optimised for web browsers, though Google says it already shows strong performance on mobile app interfaces too. Support for full desktop control isn’t available yet.

Some examples

In one of Google’s demo situations, the model was asked to collect details about pets from a website and add them as guests in a CRM system — then schedule a follow-up appointment with a specialist. In another, it organised a cluttered virtual whiteboard by dragging digital sticky notes into their correct categories.
These examples show how the model can perform multi-step tasks involving different websites and tools, mimicking how a human might complete such actions manually. Google also shared a demo on Browserbase, where users can watch it perform activities like playing the 2048 puzzle game or browsing Hacker News for trending discussions.

How it’s being used

Google said versions of the Computer Use model have already been deployed internally for UI testing, helping developers automatically verify how software behaves — a process that can save significant development time. It also powers some of Google’s agentic capabilities, such as those found in AI Mode in Search, Project Mariner, and the Firebase Testing Agent.

Availability

The Gemini 2.5 Computer Use model is now in public preview, accessible via the Gemini API in Google AI Studio and Vertex AI. Google also offers a live demo environment on Browserbase, where anyone can see the model in action.

Don't miss the most important news and views of the day. Get them on our Telegram channel

First Published: Oct 08 2025 | 10:40 AM IST

Explore News