The Generative A.I. Deepfake Dangers

The case of AI voice generator company ElevenLabs

Jan 31, 2023

Hey Guys,

I’m as bullish on A.I. tools as the next A.I. evangelist but it seems I’m more wary of the dangers as well. ElevenLabs, a London, UK-based speech AI startup, recently secured $2 million in pre-seed funding and has also released a beta version of its text-to-speech platform for English and Polish.

The round was led by Credo Ventures, with participation from Concept Ventures and angel investors, including Peter Czaban and Tytus Cytowski. So what went wrong?

The company released its first text-to-voice open beta system on Jan. 23. Developers promised the voices match the style and cadence of an actual human being. The company’s “Voice Lab” feature lets users clone voices from small audio samples. However bad actors have used the tool to generate celebrities saying all sorts of unseemingly things, a deepfake abuse.

Audio Cloning is Dangerous

ElevenLabs is designed for turning long-form text into audio using both clones of existing voices and entirely synthesized models of human speech. The AI mimics human emotion in reading the text, using context clues to decide the mood and adjusting tone and inflection to match.

The potential for abuse on the deepfake internet is just so risky though.

Even the startup seems itself perplexed about what it has set into motion:

ElevenLabs @elevenlabsio

Crazy weekend - thank you to everyone for trying out our Beta platform. While we see our tech being overwhelmingly applied to positive use, we also see an increasing number of voice cloning misuse cases. We want to reach out to Twitter community for thoughts and feedback!

AI voice generator used to deepfake celebrities spewing racist abuse

So how did the story break? Motherboard first reported Monday about a slew of 4Chan users who were uploading deepfaked voices of celebrities or internet personalities, from Joe Rogan to Robin Williams.

Joseph Cox @josephfcox

Here are some examples of AI-generated voices on 4chan created over the weekend. Shows that with the rapidly decreasing barrier of entry, the trajectory of AI-generated voices is just starting. vice.com/en/article/dy7…

4chan members used ElevenLabs to make deepfake voices of Emma Watson, Joe Rogan, and others saying racist, transphobic, and violent things.

Without A.I. regulation in the U.S., generative A.I. is a risk to how all of these tools will be used and what bad actors will be more tempted to use them for. With ChatGPT’s demo that hallucinates released as free, it’s set a virus into the wild of what A.I. labs will do - that is, release things without A.I. alignment and A.I. risk mitigation.

What Kind of Deepfakes?

The generated audio ranges in content from memes and erotica to virulent hatespeech.

The CEO and Co-founder is Mati Staniszewski. It is a bit bizarre.

They just got $2 million in a pre-seed round. A portion of the money going toward automated dubbing research, the organization’s long-term objective is to make speech understandable in any language.

Listen to their Website Demo

It’s a pity they get deepfake PR like this since they claim to Generate top-quality spoken audio in any voice and style with the most advanced and multipurpose AI speech tool out there. Theri deep learning model does seem to render human intonation and inflections with unprecedented fidelity and adjusts delivery based on context.

ElevenLabs developed proprietary deep learning models to create its AI-delivered speeches. The startup’s synthetic voices employ natural language understanding to grasp the context of what a person is saying.

AI model is built to grasp the logic and emotions behind words

Are we to think that the Deepfakes are the best or most sincere kind of flattery in this context?

The AI might spot adjectives describing someone’s speech as cheerful or sad or note the environment of a wedding or a traffic jam and adjust the delivery accordingly. It can even understand humor and sarcasm well enough to laugh when something is sarcastic funny (or at least written to imply it should be).

On Monday, ElevenLabs, founded by ex-Google and Palantir staffers, said it had found an “increasing number of voice cloning misuse cases” during its recently launched beta. Yes that’s what usually happens in beta demos, as if they didn’t expect this?

They claim their AI model is built to grasp the logic and emotions behind words. And rather than generate sentences one-by-one, it’s always mindful of how each utterance ties to preceding and succeeding text. This zoomed-out perspective allows it to intonate longer fragments convincingly and with purpose. And finally you can do this with any voice you want.

“What we do differently is we take text and the context of what you write to generate the tonality of voices. It understands the text and can know how to speak the [emotions] correctly,” ElevenLabs co-founder and CEO Mati Staniszewski told Voicebot in an interview. “It works exceptionally well on longer-form texts because it can preserve that context. No others are taking that kind of context into consideration. We also stand out in how we approach how to replicate or clone a voice. We developed a cloning module that doesn’t require training, only a few seconds of recording, though ideally a full minute.”

The company said that while they could trace any generated audio back to the user, they were choosing to address the problem by implementing additional safeguards.

Additional account verifications to enable voice cloning ‘such as payment info or even full ID verification’ and verifying copyright to the voice by submitting a sample with prompted text were presented as potential safeguards.

The startup’s artificial voices use natural language understanding to understand what is being said in context. Users of ElevenLabs’ platform can choose from the company’s library of artificial voices or quickly create a human voice clone.

Currently, ElevenLabs provides a monthly subscription service with a free tier for experimenting with the technology and a standard tier for content creators.

ElevenLabs Pricing Ranges

Ethical AI

At Eleven, they say they believe that we should strive to make the most of new technologies, but not at all cost. As they develop them, the company claims to make every effort to implement appropriate safeguards which minimize the risk of harmful abuse. With this in mind, they’re fully committed both to respecting intellectual property rights and to actioning misuse.

Time will tell if this is indeed true. But abuse of Generative A.I. tools will continue to remain a hot button issue.

Thanks for reading!

Artificial Intelligence Learning 🤖🧠🦾

Discussion about this post