Google’s second-generation AI models, starting with Gemini 2.0 Flash, introduce new agentic AI experiences. The company stated that Gemini 2.0 Flash, with its native user interface action capabilities, multimodal reasoning, long-context understanding, complex instruction following, and compositional function-calling, enables AI to function as agents. But what exactly are AI agents, and how does Google’s new model deliver the agentic experience?
What are AI agents?
AI agents are AI-powered software tools capable of performing multi-step tasks for users with minimal supervision. These autonomous systems can handle repetitive tasks that typically require manual effort. Beyond natural language processing, AI agents can make decisions, solve problems, and interact with their environment to execute actions.
Unlike traditional AI chatbots that rely solely on training data, AI agents store past interactions in memory and use this up-to-date information to plan future actions.
What is Google’s agentic experience?
Powered by the new Gemini models, Google has introduced several agent prototypes. These include updates to:
- Project Astra: A research prototype exploring the capabilities of a universal AI assistant.
- Project Mariner: Designed for web browser-based human-agent interaction.
- Jules: An AI-powered code agent for developers.
Project Astra
Also Read
Previewed at Google’s annual developer conference, Google I/O 2024, Project Astra is a prototype AI agent envisioned as the future of AI assistants. It interacts with the real world by “remembering” what it “sees” and “hears” through a smartphone’s camera and microphone.
With Gemini 2.0, Project Astra has been enhanced with the following features:
- Multilingual and mixed-language conversation capabilities.
- Understanding of accents and uncommon words.
- Access to Google Search, Lens, and Maps.
- Improved personalisation through 10-minute in-session memory.
- Reduced latency for smoother performance.
Project Mariner
Project Mariner is a research prototype designed to explore human-agent interaction via a web browser. It can analyse and reason across information on a browser screen, including pixels and web elements such as text, code, images, and forms.
Using an experimental Chrome browser extension, Mariner can complete tasks by leveraging this information. Although still in its early stages and prone to inaccuracies and delays, Google said the AI agent will improve rapidly over time. Currently, it is available to select testers.
Jules
Jules, Google’s AI-powered code agent, integrates directly into GitHub workflows. It can tackle programming issues, create plans, and execute them under a developer’s supervision.
Developers can use Jules to offload Python and Javascript tasks, including bug fixes, multi-step planning, and modifying multiple files.
Agents in gaming
Google has developed gaming AI agents using the Gemini 2.0 model. These agents help players navigate virtual environments in video games, reasoning about gameplay based solely on on-screen actions. They can offer real-time suggestions and act as virtual gaming companions.
Additionally, gaming agents can access Google Search to fetch information relevant to the game being played.