Researchers at Apple have published a new paper detailing their prototype AI-powered animation tool named Keyframer. Apple said that Keyframer uses OpenAI's GPT-4 model to generate animated illustrations from 2D images based on the user's prompt. Additionally, it has multiple editing modes, allowing users to directly edit generated animations or explicitly prompt the AI tool to tweak the animation in a certain manner.
According to the research paper, Keyframer utilises Large Language Model's (LLM) capability of generating codes and working on text prompts. Keyframer can take a Scalable Vector Graphic (SVG) file, which is a 2D image format, as input and generate CSS code to animate the input image as per the user's request. CSS code essentially describes how elements and subjects should appear on screen. Keyframer uses SVG format as these images can be scaled and resized without any loss in quality.
For editing the generated animations, Apple said that the tool provides two formats: a Code Editor, where the user can edit the CSS code directly, and a Properties Editor, which has a "dynamically created UI-layer" for editing CSS properties. The Property Editor format is designed specifically for users who are less familiar with coding. Apple said that this format is modelled after UI elements from more prominent graphic editing software such as Adobe Illustrator.
Keyframer is not available publicly and is currently in its early stages of development. In its current state, the keyframer tool has only been tested for simple animation tasks such as for generating loading sequences and visualising data. Generating complex animations using simple text prompts is not possible as of now.
Earlier this month, Apple published another research paper explaining their MLLM-Guided Image Editing (MGIE) AI Model, which is capable of editing an image using text prompts. Apple's MGIE model can effectively handle a wide range of editing scenarios, from simple colour adjustments to more complex object manipulations.
The MGIE model consists of a Multimodal Large Language Model that expands users request and provides "concise expressive instructions" that the diffusion model uses to edit the input image. According to the research paper, this way of editing allows the MGIE model to address ambiguous commands by the user to achieve the desired output.
MGIE is available as an open-source project on GitHub and can be downloaded with code, data and pre-trained models.
You’ve reached your limit of {{free_limit}} free articles this month.
Subscribe now for unlimited access.
Already subscribed? Log in
Subscribe to read the full story →
Smart Quarterly
₹900
3 Months
₹300/Month
Smart Essential
₹2,700
1 Year
₹225/Month
Super Saver
₹3,900
2 Years
₹162/Month
Renews automatically, cancel anytime
Here’s what’s included in our digital subscription plans
Exclusive premium stories online
Over 30 premium stories daily, handpicked by our editors


Complimentary Access to The New York Times
News, Games, Cooking, Audio, Wirecutter & The Athletic
Business Standard Epaper
Digital replica of our daily newspaper — with options to read, save, and share


Curated Newsletters
Insights on markets, finance, politics, tech, and more delivered to your inbox
Market Analysis & Investment Insights
In-depth market analysis & insights with access to The Smart Investor


Archives
Repository of articles and publications dating back to 1997
Ad-free Reading
Uninterrupted reading experience with no advertisements


Seamless Access Across All Devices
Access Business Standard across devices — mobile, tablet, or PC, via web or app
)