Skip to content

OpenAI’s voice cloning AI model only needs a 15-second sample to work

Via FryAI

“What’s new? OpenAI has unveiled a preview of Voice Engine, a revolutionary tool which offers synthetic speech generation capabilities.
How does it work? From just a 15-second recording, Voice Engine is capable of mimicking speech that sounds very similar to the original voice. This voice can be used to read text prompts, even if they are not in the speaker’s native language.
The beautiful:
This Voice Engine technology exhibits vast potential across various domains, from aiding children with reading to assisting in vocal restoration. If used ethically, this breakthrough could be applied in a plethora of positive ways.
The ugly:
Experts and laypeople alike have raised alarms about potential misuse for this technology, making fraudulent activities like unauthorized voice imitation and the creation of even more deceptive deepfakes even easier.
When will it be available? OpenAI, cognizant of the risks associated with Voice Engine, has opted for a cautious rollout, prioritizing responsible deployment and soliciting feedback from diverse stakeholders before putting this powerful tool in the hands of the public.”

OpenAI’s voice cloning AI model only needs a 15-second sample to work

Called Voice Generation, the model has been in development since late 2022 and powers the Read Aloud feature in ChatGPT.

Via AI Tool Report

 “OpenAI can replicate your voice with a 15-second sample

Our Report: OpenAI has launched a voice cloning AI model called Voice Engine, which can replicate a person’s voice from just a 15-second sample.
 Key Points:
  • Voice Engine has been in development since late 2022 and can produce synthetic voices that read text prompts in the speaker’s language or other languages, broadening its application scope.
  • Initial access is granted to select companies, including Age of Learning and HeyGen, showcasing the model’s potential in educational and storytelling contexts.
  • OpenAI emphasizes the importance of ethical usage, requiring partner companies to obtain consent for voice cloning and to inform users about using AI-generated voices (although this will do little to deter those with malicious intent).
  • Amidst growing concerns over AI voice misuse, OpenAI is implementing measures like audio watermarking and monitoring to ensure responsible deployment.
 Why you should care: OpenAI’s Voice Engine represents a significant advancement in voice cloning technology, highlighting the potential for innovative applications and addressing ethical and security challenges.”

Via The Neuron

“OpenAI’s new AI can say anything in your voice.

OpenAI just announced Voice Engine, an AI capable of replicating any voice in any language based on a brief audio sample.
In layman’s terms, upload a 15-second sample of Pete speaking → Generate Pete saying something new while maintaining his “vibe.”
See it in action:
The tech isn’t new (shoutout ElevenLabs and In fact, the same researcher who built Voice Engine had also built an earlier tool that ended up being the tech behind
Still, more people know OpenAI than the other two. So what does this mean in practice?
  • Translating content into other languages. HeyGen already uses Voice Engine to translate product marketing / sales videos so businesses can reach a global audience.
  • Testing hundreds of versions of marketing / audio ads featuring different voices/languages to see which performs the best prior to launch.
  • AI-generated podcasts and audio content. Perplexity already offers a podcast called “Discover Daily” that uses an AI voice from ElevenLabs to curate new summaries.
On the flip side, this technology also poses significant risks, granting bad actors the potential to misuse someone’s voice—a dilemma we’re all too aware of by now.
Remember those deepfake robocalls that imitated Biden discouraging voting New Hampshire? Or the scammer who, posing as an employee’s superior, swindled $25.6M?
One can only imagine the awkwardness of explaining that scenario to their employer…
For the time being, OpenAI plans to keep Voice Engine under wraps until it’s ready for widespread deployment.”

Via Unwind AI

“OpenAI Teases Again with a New Voice Cloning Model

OpenAI has been releasing demos of its text-to-video model Sora and we’re eagerly waiting for it to be publicly available. But before the anticipation could settle, OpenAI has announced a new model but it’s not available for use, again. The company has developed Voice Engine, an AI model for voice cloning that uses a 15-second audio sample and text input to almost perfectly clone the voice.

Key Highlights:

  1. Development and Testing: Voice Engine was first developed in late 2022. It is being tested with “trusted partners” for applications like non-readers and children, content translation, and improving essential service delivery in remote settings.
  2. Training and Data Use: The model is trained on a mix of licensed and publicly available data, with details on the training data being closely guarded considering the ramification of copyright issues.
  3. Editing: TheVoice Engine currently doesn’t allow editing the generated output. There are no options for adjusting the tone, pitch, or cadence of the voice.
  4. Pricing: Voice Engine will cost $15 per 1 million characters. It is quite cheap in comparison to the current-best in the industry – Eleven Labs – that charges $11 for 100,000 characters per month but provides editing features also. (Source)

Following is an example of translation from the HeyGen platform that is using OpenAI’s Voice Engine model.

Reference Audio:


Generated Audio in German:


OpenAI unveils AI voice cloning tool

Via The Rundown AI: “OpenAI has unveiled a preview of Voice Engine, a model that can clone human voices from a 15-second audio sample and generate natural-sounding speech.
The details:
  • The model is able to preserve the accent and emotions of the original speaker in generated speech.
  • Voice Engine is currently being tested by a small group of trusted partners, including AI startup HeyGen.
  • OpenAI has implemented safety measures like watermarking and proactive monitoring to prevent misuse.
  • The company revealed it first developed the tech in late 2022 and has been using it to power voices in its text-to-speech API and ChatGPT.
Why it matters: OpenAI is clearly far ahead in the space, with Voice Engine being deployed internally since 2022. However, with no public release in sight, the company seems to understand the risks, such as deepfake scams during an election year.”

Via OpenAI

Navigating the Challenges and Opportunities of Synthetic Voices

We’re sharing lessons from a small scale preview of Voice Engine, a model for creating custom voices.



Posted on: April 1, 2024, 8:35 am Category: Uncategorized

0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

Some HTML is OK


(required, but never shared)

or, reply to this post via trackback.