Hey Guys,
Iām as bullish on A.I. tools as the next A.I. evangelist but it seems Iām more wary of the dangers as well. ElevenLabs, a London, UK-based speech AI startup, recently secured $2 million in pre-seed funding and has also released a beta version of its text-to-speech platform for English and Polish.
The round was led by Credo Ventures, with participation from Concept Ventures and angel investors, including Peter Czaban and Tytus Cytowski. So what went wrong?
The company released its first text-to-voice open beta system on Jan. 23. Developers promised the voices match the style and cadence of an actual human being. The companyās āVoice Labā feature lets users clone voices from small audio samples. However bad actors have used the tool to generate celebrities saying all sorts of unseemingly things, a deepfake abuse.
Audio Cloning is Dangerous
ElevenLabs is designed for turning long-form text into audio using both clones of existing voices and entirely synthesized models of human speech. The AI mimics human emotion in reading the text, using context clues to decide the mood and adjusting tone and inflection to match.
The potential for abuse on the deepfake internet is just so risky though.
Even the startup seems itself perplexed about what it has set into motion:
AI voice generator used to deepfake celebrities spewing racist abuse
So how did the story break? Motherboard first reported Monday about a slew of 4Chan users who were uploading deepfaked voices of celebrities or internet personalities, from Joe Rogan to Robin Williams.
4chan members used ElevenLabs to make deepfake voices of Emma Watson, Joe Rogan, and others saying racist, transphobic, and violent things.
Without A.I. regulation in the U.S., generative A.I. is a risk to how all of these tools will be used and what bad actors will be more tempted to use them for. With ChatGPTās demo that hallucinates released as free, itās set a virus into the wild of what A.I. labs will do - that is, release things without A.I. alignment and A.I. risk mitigation.
What Kind of Deepfakes?
The generated audio ranges in content from memes and erotica to virulent hatespeech.
The CEO and Co-founder is Mati Staniszewski. It is a bit bizarre.
They just got $2 million in a pre-seed round. A portion of the money going toward automated dubbing research, the organizationās long-term objective is to make speech understandable in any language.
Itās a pity they get deepfake PR like this since they claim to Generate top-quality spoken audio in any voice and style with the most advanced and multipurpose AI speech tool out there. Theri deep learning model does seem to render human intonation and inflections with unprecedented fidelity and adjusts delivery based on context.
ElevenLabs developed proprietary deep learning models to create its AI-delivered speeches. The startupās synthetic voices employ natural language understanding to grasp the context of what a person is saying.
AI model is built to grasp the logic and emotions behind words
Are we to think that the Deepfakes are the best or most sincere kind of flattery in this context?
The AI might spot adjectives describing someoneās speech as cheerful or sad or note the environment of a wedding or a traffic jam and adjust the delivery accordingly. It can even understand humor and sarcasm well enough to laugh when something is sarcastic funny (or at least written to imply it should be).
On Monday, ElevenLabs, founded by ex-Google and Palantir staffers, said it had found an āincreasing number of voice cloning misuse casesā during its recently launched beta. Yes thatās what usually happens in beta demos, as if they didnāt expect this?
They claim their AI model is built to grasp the logic and emotions behind words. And rather than generate sentences one-by-one, itās always mindful of how each utterance ties to preceding and succeeding text. This zoomed-out perspective allows it to intonate longer fragments convincingly and with purpose. And finally you can do this with any voice you want.
āWhat we do differently is we take text and the context of what you write to generate the tonality of voices. It understands the text and can know how to speak the [emotions] correctly,ā ElevenLabs co-founder and CEO Mati Staniszewski told Voicebot in an interview. āIt works exceptionally well on longer-form texts because it can preserve that context. No others are taking that kind of context into consideration. We also stand out in how we approach how to replicate or clone a voice. We developed a cloning module that doesnāt require training, only a few seconds of recording, though ideally a full minute.ā
The company said that while they could trace any generated audio back to the user, they were choosing to address the problem by implementing additional safeguards.
Additional account verifications to enable voice cloning āsuch as payment info or even full ID verificationā and verifying copyright to the voice by submitting a sample with prompted text were presented as potential safeguards.
The startupās artificial voices use natural language understanding to understand what is being said in context. Users of ElevenLabsā platform can choose from the companyās library of artificial voices or quickly create a human voice clone.
Currently, ElevenLabs provides a monthly subscription service with a free tier for experimenting with the technology and a standard tier for content creators.
Ethical AI
At Eleven, they say they believe that we should strive to make the most of new technologies, but not at all cost. As they develop them, the company claims to make every effort to implement appropriate safeguards which minimize the risk of harmful abuse. With this in mind, theyāre fully committed both to respecting intellectual property rights and to actioning misuse.
Time will tell if this is indeed true. But abuse of Generative A.I. tools will continue to remain a hot button issue.
Thanks for reading!