I broke Copilot+'s Live Captions feature with a simple question
During a demo of Microsoft’s Copilot+ features at Computex this month, I put the Live Captions translation program to the test with a simple question and got a rather confused response.
Like any good tech tradeshow, there were plenty of demo opportunities at Computex, both on the expo floor and off-site in various hotel suites and conference rooms. Qualcomm had several demos to show off some of the AI PC features coming to Snapdragon X laptops that are part of Microsoft’s Copilot+ program. Those demos included the controversial Recall feature, Co-Creator image generation, and Live Captions.
Because Co-Creator is a tweaked version of the usual stable-diffusion image generation tool and Recall presents a potential security risk and depends on long-term usage, I focused on the Live Captions translation demo during my time with Qualcomm.
The demo was run on a Microsoft Surface Pro 11 using the tablet’s onboard microphone array to capture and translate audio. So I trotted out a simple and familiar question for anyone who studied a language and here's what happened.
Live Captions testing methodology
The Live Captions settings were set to translate into English, which meant breaking out another language. My options were rather limited. I have an elementary grasp of Spanish and can say a handful of phrases in German, Polish, and Mandarin. Like any self-respecting nerd, I studied Latin in middle and high school, which makes me a real hit in ancient history museums and just about nowhere else.
Being that this demo was held in Taipei during Computex, all of the Latin orations I had to memorize thanks to a very dedicated high school teacher were the last thing on my mind. I’m also not confident Microsoft would optimize any translation software for a dead language in 2024.
Instead, while surrounded by written Chinese Hanzi characters, I was reminded that I did take Japanese in college for a brief stint. This was useless to me in Taiwan, but the Japanese writing system uses Hanzi (called Kanji) alongside two syllabic writing systems (Hiragana and Katakana). So I can recognize the occasional meaning of a sign if it doesn’t have an English translation written immediately below or next to it.
But I’m also terrible at coming up with random test sentences off the top of my head. So I went with an old Spanish favorite: Dónde está la biblioteca. Except I asked it in Japanese: 図書館 は どこ ですか (toshokan wa doko desu ka). Everyone’s favorite: Where is the library?
Live Captions failed, or did it?
Unfortunately for Copilot+, the Live Captions kicked back a translation of “Where is it?” instead of the expected translation “Where is the library?”
Part of this could be language optimization issues going from Japanese to English, or microphone interference. While the demo area wasn’t exactly packed, there were many people speaking and recording in the room which could have caused the microphone to be unable to decipher all parts of my question. It is, after all, a rather short phrase.
In order to test this for sure, we’ll need more controlled circumstances and repeatability. Which will have to wait until we get the first Copilot+ laptops in after they launch on Tuesday, June 18th. Which also gives me time to think of a longer, better question to ask.