Home / Technology / Tech News / Gemini Omni explained: Google's AI model for video creation from any input

Gemini Omni explained: Google's AI model for video creation from any input

Google introduced Gemini Omni, a new AI video model that can create and edit cinematic clips using text, images, audio, and video inputs with realistic physics

Google introduces Gemini Omni with AI video generation that understands real-world physics like gravity, movement, and fluid dynamics. (Image: Google)

Sweta Kumari New Delhi

5 min read Last Updated : May 20 2026 | 3:22 PM IST

Listen to This Article

Google kicked off its annual developer conference, Google I/O 2026, on May 19 with a keynote event focused on Gemini and the company’s broader AI ambitions. Alongside the new Gemini 3.5 family of AI models, Google also introduced Gemini Omni, a new multimodal AI model designed for cinematic video generation and editing. According to Google, Gemini Omni can combine text prompts, images, audio, and video clips to create high-quality videos while also allowing users to edit scenes conversationally using natural language prompts.

The company said the model is also designed to better understand real-world physics concepts like gravity, movement, and fluid dynamics to make AI-generated scenes look more realistic and visually consistent.

Gemini Omni: What’s new

Google said that Gemini Omni is built around the idea of combining Gemini’s reasoning capabilities with content creation tools. The company describes it as a model that can “create anything from any input,” starting with video generation. According to Google, users can upload images, videos, text, or voice references and generate a single cohesive video output grounded in Gemini’s understanding of the real world.

One of the major changes with Omni is that users can edit videos through conversation instead of relying on traditional editing software. Google said that each instruction builds on the previous one, meaning scenes, characters, and visual consistency remain intact across edits.

Also Read

Why Google's AI Search rollout may change how people navigate internet

Google announced new workspace apps, and more at IO 2026 event

Google I/O 2026: New workspace apps announced, Flow gets Omni Flash, more

Meta reportedly offers rival AI chatbots limited free access to WhatsApp

Microsoft's biggest India data centre on track to go live in mid-2026

China's Alibaba unveils new AI chip, LLM in push for domestic alternatives

Google highlighted several video editing capabilities coming with Gemini Omni:

Edit videos using natural language: Users can edit videos simply by typing prompts, while Gemini Omni keeps scenes, characters, and physics consistent across changes.
Transform scenes and environments: Users can modify specific parts of a video or completely change the overall setting and visual style.
Change actions inside videos: Gemini Omni can alter what is happening in a scene, add new characters or objects, and reimagine moments differently.
Refine videos over multiple edits: Users can continue making changes across multiple prompts without losing continuity from the original scene.

According to Google, Omni is also designed to better understand real-world concepts like Physics’s gravity, movement, and fluid dynamics, helping generate scenes that appear more realistic. The company said the model combines Gemini’s knowledge of science, history, and culture with visual generation tools to create more context-aware and meaningful storytelling experiences rather than simply generating clips based on pattern matching.

Users can also upload drawings, reference photos, or existing footage and use them as the foundation for AI-generated scenes. According to the company, Omni can also apply visual styles, motion effects, and scene transitions based on either uploaded references or written prompts.

Additionally, Google said that users will be able to create a digital version of themselves using their own voice and appearance. The company added that voice references will be supported initially for audio-based inputs, while broader audio support will arrive later.

Transparency

As part of its responsible AI push, Google confirmed that all videos created using Gemini Omni will include its invisible SynthID digital watermark. According to the company, users will also be able to verify whether a video was generated using Gemini Omni through tools integrated into the Gemini app, Chrome and Google Search.

Google has also open-sourced SynthID text watermarking technology so developers can integrate it into their own AI models and tools. The company also announced a partnership with NVIDIA earlier this year to expand the use of SynthID beyond Google products. OpenAI and ElevenLabs will also use SynthID watermarking, helping identify more AI-generated content across the web.

ALSO READ: Google IO 2026: Android XR-based 'Audio Glasses' previewed, launching soon

How is Gemini Omni different from Google Veo?

Google’s Veo 3.1 “Ingredients to Video” and Gemini Omni are both AI video tools, but they are built for slightly different types of users and workflows. Veo 3.1 is more focused on controlled video generation using reference images or “ingredients.” Users can upload up to three images, such as characters, backgrounds, or objects — and Veo tries to maintain consistency across scenes. Google is positioning it more as a creator and production-focused tool for generating short cinematic clips with stable characters, reusable environments, vertical video support, and better scene continuity.

Gemini Omni, on the other hand, is being positioned as a broader multimodal AI creation model integrated directly into the Gemini ecosystem. Unlike Veo’s more structured “ingredients” workflow, Omni is designed to let users conversationally edit videos using natural language prompts while also combining text, audio, images, and video together. Google says Omni can reason about physics, storytelling, movement, and real-world context to make scenes feel more natural and logical.

In simple terms, Veo 3.1 is more about controlled AI video production, while Gemini Omni feels more like an AI creative assistant that can generate and continuously edit content through conversation inside Gemini apps and services.

ALSO READ: Why Google's AI Search rollout may change how people navigate internet

Rollout details

According to Google, the first model in the Omni family is called Gemini Omni Flash. The model is rolling out globally starting today for Google AI Plus, Pro, and Ultra subscribers through the Gemini app and Google Flow. Google also confirmed that Gemini Omni Flash will be available at no additional cost for users of YouTube Shorts and the YouTube Create app starting this week. The company added that support for developers and enterprise customers through APIs will arrive in the coming weeks.