OpenAI trying to steal Scarlett Johansson's voice to make AI feel 'comfortable' is the reason why it's so worrying
What you need to know
Scarlett Johansson says she was approached by OpenAI last year about using her voice for a ChatGPT voice assistant.
Though Johansson did not agree to the proposition, OpenAI shipped its GPT-4o model with a voice called “Sky” that sounds quite similar to Johansson‘s.
After legal pressure, OpenAI removed Sky from GPT-4o, and said that the voice was not based on Johansson‘s.
Still, in trying to use a friendly and welcoming voice to make AI feel more comforting, OpenAI ended up doing the opposite.
OpenAI made waves last week when it announced GPT-4o, a multimodal AI model that may be the most advanced and futuristic one we’ve seen to date. It sounds like a human, can interact with users via vision and audio, and is knowledgeable. OpenAI ended up beating Google to the punch, and GPT-4o seems more advanced than Project Astra, which Google previewed at Google I/O 2024.
But one of the voices OpenAI chose for GPT-4o has been drawing attention online for all the wrong reasons. First, some users on social media pointed out they thought the voice “Sky” was overly flirty and sultry to the point it was unsettling. Then, people started noticing the similarities between the voice of Sky and that of Scarlett Johansson, the award-winning actress. Now, it appears that may have been intentional.
To be clear, OpenAI denies that the voice of Sky was based on Johansson and even released a blog post explaining how the voices were chosen. However, Johansson put out a scathing statement telling the story of how OpenAI approached her about officially voicing GPT-4o, which she declined. After facing legal pressure from Johansson’s lawyers, the company removed the Sky voice option from GPT-4o.
As distressing as this situation is, it’s almost ironic. OpenAI’s CEO Sam Altman told Johansson that her voice, being the official voice of ChatGPT, would be more comforting to users. And yet, by releasing a voice so similar to Johansson‘s without her permission, Altman and OpenAI ended up perfectly encapsulating everything that makes people uncomfortable about AI.
Did OpenAI steal Scarlett Johansson‘s voice?
Though OpenAI says that it sought out professional voice actors for GPT-4o and did not seek someone who sounded like Johansson specifically, the evidence might tell a different story. It starts in September 2023, according to Johansson, when OpenAI’s Altman reached out about hiring her as a voice actor for ChatGPT.
“He told me that he felt that by my voicing the system, I could bridge the gap between tech companies and creatives and help consumers to feel comfortable with the seismic shift concerning humans and Al,” she said in a statement to NPR’s Bobby Allyn. “He said he felt that my voice would be comforting to people.”
Johansson eventually decided not to go forward with voicing GPT-4o. However, it’s easy to hear her resemblance in the Sky voice that ended up being demoed and shipped with the AI model. To say that Johansson was displeased with the result would be an understatement.
“We believe that AI voices should not deliberately mimic a celebrity's distinctive voice— Sky’s voice is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice,“ OpenAI said in a blog post.
The whole reason that OpenAI wanted a voice like Johansson’s, as Altman is said to have told her, is to make AI more comforting. People may be more scared about AI than they are excited about it. Especially those in creative industries are finding that AI is being used to automate writing, visual art, music, and other mediums. This isn’t something unique to OpenAI — Apple recently came under fire and apologized for an advertisement that literally saw instruments being crushed into pieces and replaced with an iPad.
By using her likeness in a GPT-4o voice without her permission — whether intentionally or unintentionally — OpenAI ended up validating the discomfort associated with AI that it was desperately trying to address. Creatives, from actors and actresses to writers and photographers, are worried about being replaced by AI. The idea that OpenAI could have mimicked Johansson‘s voice for GPT-4o is exactly the kind of thing that worries and alarms people in creative industries.
“When I heard the released demo, I was shocked, angered and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference,” Johansson wrote, explaining that she asked OpenAI to show how it developed the Sky voice. “In a time when we are all grappling with deepfakes and the protection of our own likeness, our own work, our own identities, I believe these are questions that deserve absolute clarity.”
We shouldn’t want AI to sound this human
Aside from the unsettling idea that a company could rip off an actress‘ voice after disagreeing with a deal, there are other reasons why we don’t want AI voices to sound like Sky. All of OpenAI’s GPT-4o voices, and especially Sky, sound very human-like. This is a problem, because there is a high level of trust and familiarity people have with human voices. When you talk to a voice assistant like Siri or Alexa, it’s clear that you’re talking to — for lack of a better word — a robot. After having a conversation with GPT-4o, that level of clarity won’t always be in the back of your mind.
Right now, AI models have a problem, and it’s that they confidently state their answers as fact even when they are blatantly wrong. People still end up believing AI responses as true despite the array of warnings that come along with them. As voices for AI models become more human-sounding, this problem will only get worse. It’ll be easy for the average user of an AI tool to believe what is being said thanks to the welcoming human voice it uses.
In trying to make people more comfortable with the future of AI, OpenAI ended up making it feel more dystopian. We shouldn’t want AI to sound as human as GPT-4o, and there are plenty of reasons why. It could foster an unwarranted level of trust between users and AI models, as well as put creatives like Johansson In a precarious position.