Microsoft’s artificial intelligence (AI) assistant feature, Copilot Vision, will now be able to see what’s on users’ screens. The Chief Executive Officer at Microsoft AI, Mustafa Suleyman, announced that the said feature would now be able to interpret what’s appearing on the screen in the Edge browser and then help users use apps.
Microsoft defines Vision as a “talk-based experience,” which essentially means users use it by speaking into the air, then waiting for Copilot to respond.
The Microsoft AI CEO added, if users opt into the feature, then it will be able to “literally see what you see on screen.” He suggested that turning this feature on can guide users through a recipe while they cook or have it decode job descriptions, and jump right into customised interview prep or cover letter brainstorming.
As per a Microsoft support page, “Copilot Vision may highlight portions of the screen to help you find relevant information,” but it doesn’t actually click links or do anything on behalf of users.
Also Read
Microsoft states that while Copilot Vision sessions are active, the tool records its own responses but does not store user inputs, images, or the content of the pages being viewed. Users can end the session at any time by either closing the browser window or manually stopping the screen sharing.
Currently, the more advanced Copilot Vision features are restricted to those with a Copilot Pro subscription. Subscribers can access these capabilities beyond the Edge browser, enabling Copilot to assist with tasks in programs like Photoshop, video editing tools, or even provide in-game guidance in titles such as Minecraft.