Microsoft Copilot: Screen Reader, Deep Thinker, Vocal Assistant

A week after announcing a wave of updates for its enterprise suite of Copilot AI-powered products, Microsoft is launching new Copilot capabilities on Windows for all users, including a tool that can understand and respond to questions about what’s on your screen.

Refreshed Copilot apps for iOS, Android, Windows, and the web are rolling out today, and all feature a Copilot with a more “warm” and “distinct” style, as Microsoft describes it. Microsoft is also bringing the chatbot to WhatsApp, letting users chat with Copilot via DM, similar to the experience you get with other bots on Meta’s messaging platform.

Copilot Vision

Copilot Vision has a view of what you’re viewing on your PC — more specifically, a lens into the sites you’re visiting with Microsoft Edge. Gated behind Copilot Labs, a new Copilot Pro-exclusive opt-in program for experimental Copilot capabilities, Copilot Vision can analyze text and images on web pages and answer queries (e.g., “What’s the recipe for the food in this picture?”) about them.

Vision, which can be pulled up by typing “@copilot” in Edge’s address bar, isn’t exactly a technical marvel. Google offers similar search technology on Android, and recently brought bits and pieces of that tech to Chrome as well.

But Microsoft suggests that Copilot Vision is more powerful and conscious of privacy than previous screen-analyzing features.

“Copilot Vision can … suggest next steps, answer questions, help navigate whatever it is you want to do, and assist with tasks, all while you simply speak to it in natural language,” Microsoft wrote in a blog post shared with TechCrunch. “Imagine you’re trying to furnish a new apartment. Copilot Vision can help you search for furniture, find the right color palette, think through your options on everything from rugs to throws, and even suggest ways of arranging what you’re looking at.”

Using Copilot Vision to ask questions about a photo on the web.Image Credits:Microsoft

No doubt eager to avoid another round of bad press from AI privacy fumbles, Microsoft is stressing that Copilot Vision was designed to delete data immediately following conversations. Processed audio, images, or text aren’t stored or used to train models, the company claims — at least not in this preview version.

Copilot Vision is also limited in the types of websites that it can interpret. For the time being, Microsoft’s blocking the feature from working on paywalled and “sensitive” content, limiting Vision to a pre-approved list of “popular” web properties.

What does “sensitive” content entail, exactly? Porn? Violence? At this juncture, Microsoft wouldn’t say.

Accusations of circumventing paywalls with AI tools have landed Microsoft in legal hot water in the recent past. In an ongoing lawsuit, The New York Times alleged that Microsoft allowed users to get around its paywall by serving NY Times articles through the Copilot chatbot on Bing. When prompted in a certain way, Copilot — which is powered by close Microsoft collaborator OpenAI’s models — would give verbatim (or close-to-verbatim) snippets of paid stories, according to The Times.

Microsoft said that Copilot Vision, which is U.S.-only at the moment, will respect sites’ “machine-readable controls on AI” — like rules that disallow bots from scraping data for AI training. But the company hasn’t said precisely which controls Vision will respect; there are several in use. We’ve asked Microsoft for clarification.

Many major publishers have opted to block AI tools from trawling their websites not only out of fear their data will be used without permission, but also to prevent these tools from sending their server costs soaring. If the current trend holds, Copilot Vision may not work on some of the web’s top news sites.

Microsoft said it’s committed to “taking feedback” to allay concerns.

“Before we launch broadly, we’ll continue to … refine our safety measures and keep privacy and responsibility at the center of everything we do,” Microsoft said in the blog post. “There is no specific processing of the content of a website you are browsing [with Copilot], nor any AI training — Copilot Vision simply reads and interprets the images and text it sees on the page for the first time along with you.”

Think Deeper

As with Vision, Copilot’s new Think Deeper feature is an attempt to make Microsoft’s assistant more versatile.

Think Deeper gives Copilot the ability to reason through more complex problems, Microsoft said, thanks to “reasoning models” that take more time before responding with step-by-step answers.

Which reasoning models? Microsoft was a bit cagey when I asked, saying only that Think Deeper uses “the latest models from OpenAI, fine-tuned by Microsoft.” Reading between the lines, it’s a safe bet that they’re a customized version of OpenAI’s o1 model.

“We’ve designed Think Deeper to be helpful for all kinds of practical, everyday challenges, like comparing two complex options side by side,” Microsoft wrote in a blog post. “Think Deeper can help with anything from solving tough math problems to weighing up the costs of managing home projects.”

Microsoft talked up Think Deeper’s potential quite a bit in its press materials. But assuming the model underneath is o1, it will most certainly fall short in some areas. We’re curious to see what sort of enhancements Microsoft made to the base model and how forthcoming Think Deeper is about its limitations.

Think Deeper will be available from today to a limited number of Copilot Labs users in Australia, Canada, New Zealand, the U.S., and the U.K.

Copilot Voice

A new Copilot feature generally available today is Copilot Voice (not to be confused with GitHub’s Copilot Voice). Launching in English in New Zealand, Canada, Australia, the U.K., and the U.S. to start, Voice adds four synthetic voices, letting you talk to Copilot and have its responses be spoken aloud.

Image Credits:Microsoft

Like OpenAI’s Advanced Voice Mode for ChatGPT, Copilot Voice can pick up on your tone during conversations and respond accordingly, and you can interject at any point while Copilot Voice is answering. A Microsoft spokesperson told me that the mode uses “the latest voice technology with new models that have been fine-tuned for the Copilot app.” What tech? Which models? On the specifics, mum’s the word.

One thing to be aware of: Copilot Voice has a time-based usage limit. Copilot Pro subscribers get more minutes but the number is “variable,” Microsoft told me, depending on demand.

Personalization

Copilot will soon become more tailored to your likes and preferences, Microsoft said, thanks to a new personalization setting.

When the setting is enabled, Copilot will draw on your past interactions and history, as well as your interactions with other Microsoft apps and services (Microsoft won’t say which) to recommend ways to use Copilot.

“This helps you get going,” Microsoft wrote in a blog post, “offering both a handy guide to Copilot’s useful features and conversation starters.”

Personalization in Copilot, which can be switched off in the Copilot settings menu on Windows, isn’t slated for the U.K. or EU anytime soon. But users elsewhere should begin to see the setting this afternoon.

Microsoft and the EU have had a testy relationship where it concerns the company’s AI product rollouts. In May, the EU warned Microsoft that it could be fined up to 1% of its global annual turnover under the bloc’s online governance regime, the Digital Services Act, after the company failed to respond to a request for information that focused on its generative AI tools.

A number of tech giants beyond Microsoft, including Apple and Meta, have taken a cautious approach to launching AI tools in the EU, wary of running afoul of the bloc’s laws governing data privacy and model deployment.

“For users in the European Economic Area (EEA) and a limited number of other countries, we are evaluating options before offering this level of Copilot personalization for those users,” a Microsoft spokesperson told TechCrunch. “Some features will not be available in the EEA until a later date.”

Microsoft Copilot: Screen Reader, Deep Thinker, Vocal Assistant

Copilot Vision

Think Deeper

Copilot Voice

Personalization

Recent Posts

InstantIR: Restore Your Images

Omnigen: Next-Gen Image Generation & Editing Tool

Create Engaging Videos with Mochi1

Local Installation of Stable Diffusion 3.5

Video Depth Mapper: Efficient 3D Depth Mapping Solution

Top 30 Negative Prompts for Reliable Diffusion Models