Google Gemini can now turn your photos into videos with audio: Check how

Google's latest Gemini AI update lets users turn still images into short, sound-enabled videos using its advanced Veo 3 model, now available in India

Veo 3
Veo 3 (Image: Google)
Sweta Kumari New Delhi
3 min read Last Updated : Jul 11 2025 | 11:25 AM IST
Google has introduced a new feature for Gemini AI that allows users to animate still photos into eight-second videos with sound, powered by the Veo 3 video generation model. This tool, which adds background noise, ambient audio, or even spoken dialogue, is now rolling out in select regions, including India, for Gemini Advanced Ultra and Pro subscribers.
 
While currently available through the web interface, Google has announced that mobile support will follow later in the week.

Turning still into video with sound: How it works 

With this new tool, users can upload a photo, describe the desired motion, and optionally include prompts for audio effects or narration. Gemini then generates a short 720p video in MP4 format, using a 16:9 landscape layout.
 
Josh Woodward, Vice President of the Gemini app and Google Labs, recently demonstrated the feature on X (formerly Twitter), sharing how a child’s drawing was turned into a short animated clip with synchronised sound. “Still experimental, but we wanted our Pro and Ultra members to try it first! It’s really fun to take kindergarten artwork and make it come to life with sound,” Woodward wrote.
 
To maintain transparency, all videos include a visible “Veo” watermark in the bottom-right corner and a hidden SynthID digital watermark created by Google DeepMind. This invisible signature helps verify that the content was generated by AI.

Here are the steps to use Gemini AI's new photo-to-video feature:

  • Click on the “tools” icon in the prompt bar.
  • Choose the “video” tool from the list.
  • Upload a still image you want to animate.
  • Enter a description of the desired motion.
  • Add optional audio cues (e.g., sound effects, dialogue, ambient sounds).
  • Gemini will generate a short 720p MP4 video in 16:9 format.
  • Audio will automatically sync with the visuals.

Google Veo 3: What is new?

First unveiled at Google I/O, Veo 3 is Google’s most sophisticated video model to date. It can generate realistic visuals and synchronised sound from either text or image-based prompts.
 
A Google blog post explains: “Veo 3 excels from text and image prompting to real-world physics and accurate lip syncing. It’s great at understanding; you can tell a short story in your prompt, and the model gives you back a clip that brings it to life.”
*Subscribe to Business Standard digital and get complimentary access to The New York Times

Smart Quarterly

₹900

3 Months

₹300/Month

SAVE 25%

Smart Essential

₹2,700

1 Year

₹225/Month

SAVE 46%
*Complimentary New York Times access for the 2nd year will be given after 12 months

Super Saver

₹3,900

2 Years

₹162/Month

Subscribe

Renews automatically, cancel anytime

Here’s what’s included in our digital subscription plans

Exclusive premium stories online

  • Over 30 premium stories daily, handpicked by our editors

Complimentary Access to The New York Times

  • News, Games, Cooking, Audio, Wirecutter & The Athletic

Business Standard Epaper

  • Digital replica of our daily newspaper — with options to read, save, and share

Curated Newsletters

  • Insights on markets, finance, politics, tech, and more delivered to your inbox

Market Analysis & Investment Insights

  • In-depth market analysis & insights with access to The Smart Investor

Archives

  • Repository of articles and publications dating back to 1997

Ad-free Reading

  • Uninterrupted reading experience with no advertisements

Seamless Access Across All Devices

  • Access Business Standard across devices — mobile, tablet, or PC, via web or app

More From This Section

Topics :GoogleGemini AIartifical intelligence

First Published: Jul 11 2025 | 11:25 AM IST

Next Story