Google goes after OpenAI with Veo 2 video generation AI model: Details here
Built on the model Google introduced earlier this year, Veo 2 can generate videos in 4K resolution and lengths in minutes, while OpenAI's Sora can do full-HD resolution short videos
Harsh Shivam New Delhi Google has enhanced its artificial intelligence-powered image and video generation capabilities. The American tech giant has introduced its second-generation video generation model, Veo 2, alongside improvements to its existing Imagen 3 image-generation model, which now produces brighter and more composed images. The company has also unveiled a new experimental tool, Whisk, which allows users to stylise and remix images for unique outputs.
Google Veo 2 model
Google said that the Veo 2 model is designed to improve the understanding of real-world physics, human movement, and expression, enabling it to generate more realistic videos with finer details. Google claims that the model can process complex requests, including genre, lens type, and cinematic effects. The new model can generate videos in resolutions up to 4K, with video lengths extending to several minutes.
Veo 2 is integrated into Google Labs' video generation tool, VideoFX. Users can visit Google Labs and join the waitlist for access to the new features. Google also plans to expand Veo 2 to YouTube Shorts and other products next year.
Imagen 3
Google has also enhanced its Imagen 3 image-generation model, which now offers the ability to render a wider variety of art styles with greater accuracy—from photo-realism and impressionism to abstract and anime. The update also improves the model’s ability to follow input prompts more closely and produce images with more detail and texture.
Like Veo 2, the updated Imagen 3 model will be available in Google Labs, in the image-generation tool called ImageFX.
Whisk
Google’s latest experimental tool, Whisk, combines the capabilities of Imagen 3’s image generation with Gemini’s visual understanding and description capabilities. Whisk allows users to input or create images according to their preferences and remix them for unique outputs. When a user inputs an image, Gemini automatically writes a detailed caption, which is then fed into Imagen 3. This process allows the model to generate images in different styles based on the input and description.
*Subscribe to Business Standard digital and get complimentary access to The New York TimesSubscribeRenews automatically, cancel anytime
Here’s what’s included in our digital subscription plans
Exclusive premium stories online
Complimentary Access to The New York Times

News, Games, Cooking, Audio, Wirecutter & The Athletic
Curated Newsletters

Insights on markets, finance, politics, tech, and more delivered to your inbox
Market Analysis & Investment Insights
Seamless Access Across All Devices