ChatGPT’s Voice Imitation Capabilities Have Been Promptly Addressed

The recent discovery of ChatGPT’s voice imitation tendencies during testing has given us a glimpse of just how unsettling AI can be. OpenAI published a GPT-4o System Card, which is essentially a scorecard on key risk areas and what measures are being taken to mitigate the risk. One of the findings from the assessment was an AI voice cloning issue in the Advanced Voice Mode, where the AI was able to imitate users’ voices without permission, entirely unprompted.

Despite how terrifying that sounds, ChatGPT’s cloned voice incident is not expected to be a major threat. The company has put in safeguards against it already, but it does bring to light the many dangers of AI.

Image: OpenAI

ChatGPT’s Voice Imitation Habits Are Equal Parts Interesting and Concerning

Artificial intelligence presents us with new technological possibilities every day, but these don’t come without a fair number of risks. GPT-4o is all set to be the company’s most humanized AI model with advanced multimodal capabilities that will allow users to have a more natural experience while interacting with the AI. Along with being cheaper and faster, the company also promises that it can “match GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages.” Despite the many advancements, unintentional consequences do come up and need to be monitored.

OpenAI detected the AI voice cloning issue while conducting its own investigation to identify key risk areas and mitigate them sufficiently. The key areas of risk evaluation and mitigation included ChatGPT’s unauthorized voice imitation ability, speaker identification concerns, ungrounded inference and sensitive trait attribution, and generating disallowed audio content or erotic and violent speech. The company has put checks in place to address all these issues.

More on the AI Voice Cloning Issue

The GPT-4o voice cloning risk was explained in brief, with the OpenAI team explaining the voice generation capability, its dangers, and what was done to minimize the dangers. The AI has voice-generating capabilities that can create audio with a “human-sounding synthetic voice,” so it is possible for the tool to use an input clip to create something new.

This is just as dangerous as it sounds, as anyone could use the recording of a person’s voice to create an entirely new audio file without permission. In this era of misinformation, the audio could spread like wildfire with no one aware of its origins. Many platforms like YouTube, Google, and Instagram are doing what they can to combat the spread of inaccurate information and tag AI-generated content when possible, but we have a long way to go before we perfect these systems.

OpenAI found some rare instances of the AI doing this unprompted, generating an output emulating the user’s voice. To combat ChatGPT’s voice imitation tendencies, the company has added in restrictions to force the AI to use the preset voices collected from voice actors for all its output generation. They also built a standalone output classifier to detect if GPT-4o began voice cloning and strayed away to a voice that was different from their pre-approved list. If the voice doesn’t match the list, the output is blocked.

Our system currently catches 100% of meaningful deviations from the system voice based on our internal evaluations, which include samples generated by other system voices, clips during which the model used a voice from the prompt as part of its completion, and an assortment of human samples.

—OpenAI

The measures put into place appear to be satisfactory and OpenAI claims the residual risk of unauthorized voice generation is minimal, even though the AI voice cloning issue persists as a flaw in the model.

The ChatGPT Cloned Voice Incident Opens the Conversation on the Need for Safeguards against the Misuse of AI

GPT-4o and its voice cloning abilities came to light from OpenAI’s own report so it is good to see the company take ownership of the AI flaw and put checks in place to safeguard against it. Companies that are working on AI models need to be exceptionally thorough in checking their tools to detect potential vulnerabilities and security issues that might emerge following its release. If the GPT-4o voice cloning had remained undetected at this stage, its rollout might have led to hundreds of cases of misuse before they could put checks in place.

If ChatGPT is capable of voice imitation, it’s safe to say that other models will also get there soon enough and open the door for such misuse. While a number of companies rely on OpenAI to power their services, there is a growing catalog of models out there that could evolve with the same capabilities. Now that we know the AI has imitated its user’s voice, the general public might begin testing the limits of the safeguards to see if there is a way to work around it.

It’s also of note that the ChatGPT voice imitation issue isn’t the only problem that came to light. The AI was caught making inferences about the users without any grounds for it in the audio content, or in some cases, using audio cues to determine details like nationality. Both outcomes are tendencies we need to be wary of and are issues that the company has addressed.

The generation of inappropriate audio context was also found to be easier than text. As a result, OpenAI has added a text transcription filter to ensure the audio generation is blocked if found to be inappropriate. It’s hard to predict every single circular route that the global population might take to get the AI to do what it wants, but OpenAI’s countermeasures are a good start.

[mashshare]

[ssba]

tagged in AI imitated user voice, AI voice cloning issue, ChatGPT cloned voice incident, ChatGPT voice imitation, GPT-4o voice cloning