Oct 16, 2023
Like it or not AI is here, and it will only get better. Where does that leave Voice Artists, Podcasters and Content Creators who currently have no protections in terms of owning their voice?
Tim Friedlander is an award-winning, voice actor, studio owner, advocate, and educator. Tim is also the Founder and President of NAVA, The National Association of Voice Actors as well as co-owner and editor of The Voice Over Resource Guide. His work with NAVA puts him at the coal face of negotiations with the likes of voices.com and the AI seeding debate. We have him on the show this week to give us an insight into where we might be headed in terms of a compromise, what protections we might be able to put in place, and most troublingly the short amount of time we have to get it done before it may effectively be too late.
A big shout out to our sponsors, Austrian Audio and Tri Booth. Both these companies are providers of QUALITY Audio Gear (we wouldn't partner with them unless they were), so please, if you're in the market for some new kit, do us a solid and check out their products, and be sure to tell em "Robbo, George, Robert, and AP sent you"... As a part of their generous support of our show, Tri Booth is offering $200 off a brand-new booth when you use the code TRIPAP200. So get onto their website now and secure your new booth...
And if you're in the market for a new Mic or killer pair of headphones, check out Austrian Audio. They've got a great range of top-shelf gear..
If you haven't filled out our survey on what you'd like to hear on the show, you can do it here:
https://www.surveymonkey.com/r/ZWT5BTD
Join our Facebook page here:
https://www.facebook.com/proaudiopodcast
And the FB Group here:
https://www.facebook.com/groups/357898255543203
For everything else (including joining our mailing list for exclusive previews and other goodies), check out our website
Hunter S Thompson
Summary
In this episode of Pro Audio Suite, we explore the controversial
topic of AI voices with special guest Tim Friedlander. Voices.com
has reportedly promised not to use people's voices from their
database without permission, but the potential misuse of audition
files by clients remains a concern. We discuss the fairness of
voice synthesis, highlighting Nava's call for consent and
compensation for voice actors. Listeners will gain insight into the
problematic quality of AI voice samples and the potential threat to
new voice actors as AI begins to replace human voices in certain
sectors. We also delve into the future role of agents as potential
AI voice libraries, and the necessity for clear licensing fee
structures and strong protections before the end of the year to
prevent misuse.
#VoiceAIControversy #FairVoicesCampaign #FutureOfVoiceActing
Timestamps
(00:00:00) Introduction
(00:00:43) Voices.com's Promise
(00:03:31) Copyright Laws and AI Voices
(00:11:50) Review of AI Voice Samples
(00:12:59) Risks of Recorded Audio
(00:14:25) Dangers of AI
(00:19:57) AI Replacing Human Voices
(00:23:26) AI's Impact on Visual Artists
Transcript
Speaker A: Y'all ready be history.,Speaker B: Get started.,Speaker
A: Welcome.,Speaker B: Hi. Hi. Hello, everyone, to the Pro Audio
Suite.,Speaker A: These guys are professional and motivated with
tech. To the Vo stars George Wittam, founder of Source Elements
Robert Marshall, international audio engineer Darren Robbo
Robertson and global voice Andrew Peters. Thanks to Triboo Austrian
audio making passion heard. Source elements. George the tech.
Wittam and robbo and AP's. International demo. To find out more
about us, check thepro audiosuite.com.,Speaker B: Learn up learner.
Here we go.,Speaker C: And welcome. And don't forget, if you want
to get a discount of $200 off your Tribooth trip, 200 is the code
you need now, this week. Very topical. Of course, this AI thing
will just not go away. And I know that there was a conversation
about that place. I don't even like saying it. Anyway, I will say
it. Voices.com supposedly have promised not to farm out people's
voices from their database. Tim Friedlander has been involved in
this and has written an article, which is what I saw. And Tim is
joining us. G'day Tim.,: Hello. Hello. I'm here.,Speaker C: So
what's the backstory to this and how did you get involved?,: The
backstory to the AI voices.com thing goes back to about May when
Davidcirellianvoices.com announced that they were releasing Voices
AI and for the voice acting community, that was a huge concern,
basically for the main part being that many people have been
uploading audio to their website through their website for 20
years. So theoretically, Voices.com or either of these sites has 20
years of very high quality data and audio that they could use to
synthesize our voices. So through Nava, which is association that I
run along with Karen Guilfrey and a board of directors, we reached
out to David and Stephanie and had a week of conversations with
them to get the assurance that they had never been uploading or
using or doing anything with auditions or files that have been
uploaded through their website. And out of that came our Fair
Voices campaign or the Fair Voices pledge that we launched. And we
reached out to the other online casting sites, six other sites, to
get the same assurances from them and also to make sure that they
had changed their terms of service. So Voices.com at the time
changed their terms of service to very explicitly say they would
not be using any audio files uploaded through their site for
machine learning or synthesized or synthesizing voices.,Speaker C:
Was that backdated or is that from that point onward?,: The terms
of service were from that point onward, but they publicly at the
time and in various blog posts and other written areas have said
that they have never used audio files for that. The caveat being is
that once the audio files are uploaded and sent to a client, it's
possible that the client then could take those audition files and
use them. We don't know and haven't seen any companies per se who
we know are doing that but over the last ten or so years, a lot of
these companies have been working in the AI TTS sphere and very
potentially could have been using that audio for training. We
haven't seen it yet explicitly that we know of, but the inability
to track our audio files and to know where the audio goes once
we've emailed it out or uploaded through a website makes that a
real possibility.,: So to give this some perspective, is there any
sort of copyright law or anything in place at the moment that
protects someone from having their voice turned into an AI voice
without their permission?,: That's a great question. Short answer
is no. We've been working with the Copyright Office. I gave a
presentation to the FTC last week at a roundtable. I've spoken with
multiple lawyers and people across the country and across the
world. We're working with a group in Europe to help with the EU AI
act. Most actors, voice actors, we give away our files as a work
for hire, and the understanding is that that audio will be used for
this very specific project. Unfortunately, that also basically
gives the person we've given the audio file to the copyright and
the ability to do whatever they want to with that. We're currently
looking at the possibility that since most voice actors record from
home, if from like a music perspective, we could theoretically be
the owners of the master files, because a lot of times there's no
contracts that are signed. But that's an early we're in the early
stages of of exploring that. But there are copyright law does not
currently protect the voice actor. It protects the copyright
holder, which 99% of the time is the company who hired us. Wow. The
only other thing we could fall back on is right, right of
publicity. But those laws are only really in California and New
York, where the strongest laws and then there's possibly biometric
and privacy laws, but those really are only strongest in Illinois
and Texas of all places, privacy rights.,Speaker C: So is there a
way of know? We've talked about this before having some kind of
fingerprint of, your know, if anybody uses your voice, it's quite
obvious it's yours because it shows some kind of a fingerprint in
the waveform, potentially. I don't know how that would work, but
there must be someone who's got.,: Something nobody does currently
that we know of. I've spoken with people at DARPA and at NASA. We
are currently working. We've gone very deep in this conversation to
try and figure out a way to do this, what we can do. And actually,
I'm working on this with another company that I started about three
years ago to create voice prints that we can then use to match a
human voice to a synthetic voice and also to match a human voice to
a human voice to say that they're the same person. You could
theoretically, if we can get that software in place lock down a
voice. So if somebody tries to upload it to a synthetic voice site,
it would be locked and would be flagged as basically essentially
DRM for voice is what we're trying to do. But the only thing really
that you could do that might stay is some kind of spread spectrum
watermarking that you could do within that. But it'd have to be
embedded so deeply in there that you could rip this into Pro Tools
or rip it into something else right. And transfer it between audio
files or different Daws and strip out. If it's frequency, then it's
very easy to pull out frequencies. Most of the stuff that's out
there watermarking is pretty easy to bypass currently.,Speaker C:
Well, you just have to get clarity or something and it's gone.,:
Yeah, exactly. Yeah.,: So what's the compromise future from your
perspective then? Would it be a point where Darren Robertson is
selling his voice sample disc to AI people? Or would you rather not
see AI at all?,: I'm a musician primarily. I was in Seattle in the
was on the cusp of playing live and really exploring music when
napster and everything hit. And from a consumer perspective, that
was one of the most eye opening things that I'd ever seen. The
ability to now have access to a massive amount of audio that I'd
never heard before. Not anti technology by any means and definitely
not anti AI. I've worked with a synthetic voice company. I have
know people who are working with synthetic voice companies. The
issue right now is that a lot of the foundational models, a lot of
the foundations of these AI generative engines, synthetic voice
engines are built on somebody's data and more than likely they are
being built on the literal voices of voice actors. So we become the
foundation of a lot of these models. What Nava has been asking for
is consent, control and compensation. And it's the same thing that
all artists are asking for, musicians are asking for, models are
asking for, is if you're going to take my data and what makes the
essence of me. My voice or my image, or the way I walk or the way
that I speak, the cadence that I have, the way that I stand. All of
those things are very personal to all of us individually. And that
data is basically being turned into data, right. What makes us is
being turned into data and put into these synthetic voice engines
or these synthetic generative engines or generative AI to produce
images and videos and photos and voices that are based on real
humans and sound like and look like real humans. So we try to find
consent, control and compensation for those and really consent to
say yes or no. You can make a synthesized version of my
voice.,Speaker C: So if we're talking about AI voices, we're not
going to stop. It's already out. I mean, the thing's going to
happen.,: They're out there. Yes, correct.,Speaker C: How do you
perceive we control. It?,: The only thing that we can currently do
right now. And this is part of what this discussion at the FTC came
up with last week, is really, I think, from a consumer perspective,
a consumer safety perspective, I think that there is so much danger
in disinformation and false. Information and just absolute lies
that are out there that can now be easily replicated and put into a
video or an audio or something that is not very easily detectable.
It's almost impossible to tell a synthetic voice from a human voice
that are done well. It's hard to tell a synthetic image from a
factual image. The laws and regulations currently our laws and
legislation, I think, is currently the only thing that we can
really do on a broad scale to help stem the tide of the damage
that's been done already. And going forward, we have to have very
clear contracts and agreements in place that either do or do not
allow for the use of somebody's voice to be used in a synthetic
voice or generative. AI. That's partially what the WGA and SAG
afterstrikes are about. AI is the top of that list of things that
are concerns, and it's a top concern for anybody who is in the arts
right now that creates anything that any of that could be put into
a synthetic engine of some kind and have a new creation made out of
that. We just came out of a pandemic where we relied on artists, on
musicians and filmmakers and actors and voice artists. And the
first thing we do out of that pandemic is try and replace those
people. That's really essentially what's happening. There is some
accessibility. There are places that there is an argument to be
made for doing things that a human couldn't generate. But when it's
done to replace somebody, when it's done just to save money, that's
where the concern comes in. And we know that money, those savings,
are not going to be passed along to the consumer. A video game is
not going to be cheaper for somebody to buy because it has
synthetic voices. A movie is not going to be cheaper at the movie
theater because it's synthetically generated. So they cut out the
people. They cut out the people who actually make this work, and
then that money just goes to the company that gets to save that
money at the expense of everybody.,: Why would voices.com say the
quiet part out loud? They're a bit like Uber basically going like,
hi, please work for us. Make us money, and then we're going to put
all of our money into figuring out how to make driverless cars so
we don't need you see bitches.,Speaker C: Yeah, exactly.,: They
did. I don't know if anybody saw the news last week, but David
Cicearelli is out and Morgan Stanley is it morgan Stanley who was
the venture capital whoever gave them the money, they replaced him
at the top. My guess is that they either went all in on AI and it's
not paying off, or they weren't seeing this is all purely
speculation. This is just what we can have for conjecture in this
place. So I know nothing for fact, but they invested a massive
amount of money in them, what, $18,000,015 to $18,000,000.07 years
ago. And if they went all in on AI, I don't know if anybody's
heard.,: They lost all of it.,: Yeah, they lost all of it. Has
anybody actually have you guys heard their AI? The voices AI their
samples. They're terrible.,: Never heard it.,: They're terrible.
They are terrible. But they were done with consent, control and
compensation.,: Is it better or worse than voicealo?,: I haven't
heard that one. But most of what I deal with, I deal with Eleven
Labs and Play HT are the two that I use most often, for example,
for samples in that. And both of those are phenomenal. They are
really good. And voices. AI is nowhere. It sounds about ten years
old, the technology, from what I heard, and some of the voice
actors who had their voices synthesized, who participated in this
are not happy with how that voice sounds.,Speaker C: Yeah, I was
going to say, just to lighten up a bit, there's an old gag that
could actually be modernized and you can ask the question, how many
voiceover artists does it take to change a light bulb? And the
answer is none. You get an AI to do it.,: That was a drummer
joke.,Speaker C: I know we can update it.,: It.,: Just hasn't
happened quite yet.,: I was going to say. Yeah, exactly. I've heard
that one before somewhere. So the thing that occurs to me though,
Tim, is it's great that we're protecting voice actors and all that
sort of stuff, but obviously there's a crapload more voice samples
out there. I mean, how many podcasts are there out there? And
YouTube content creators and all the rest of it? All these places
they could go mining for voices.,: How do we protect know?
Currently we can't currently there is no protection for Know. This
goes into Know, we talk about this being more it's with anybody who
has recorded audio is at risk. And that voice actors just happen to
be the ones who make a living off of our recorded voice most of the
time, but doesn't mean that others aren't making a living off of
what they have on the podcast and YouTube. And even those who are
just hobbyists at this, who just have a little bit of recorded
audio, some twitch stream. I can currently record all the audio off
this and make a synthetic voice of anybody on this conversation
right now, as can anybody who's listening to it.,Speaker B:
Right.,: And it's easy.,: What work does it really kill, truly
kill? Like in the short term? I can see it taking out a crapload of
elearning and other things like that.,: It takes that out that's
any of the stuff that is purely factual, a lot of times talk about
factual stuff where I just need information read. A lot of that
stuff gets taken out right away, which if you can license your
voice to that, then you can still have a career as a voice actor.
One of the things that I think is the dangerous part of this, and
this goes for any of the arts, is that a lot of these places that
are going to be replaced first are where a lot of voice actors, a
lot of artists learn. This is how you cut your teeth and you come
up through the industry. You do the free jobs, you do the cheap
jobs, you do the entry level jobs. Those entry level jobs go away
right away because it's cheaper. But a lot of the times it's
better. Unfortunately, it is better. The audio quality of a voice
actor who's just starting out, who is using a USB mic in their
living room with hardwood floors and the refrigerator running and
the AC is going to be at risk for sure, and I think rightfully
so.,: I'll give you another one, is the company that doesn't hire
anybody, right. And they just see the AI voices as it's better than
having Mary Jo read it because it's going to take her a long time
and whatever. And so just type it into the system. And there's our
video. It's our instruction video on how to use our garden hose
absolutely or something. And yeah, it's going to take out I don't
see it initially taking out real voice acting, but I agree, just
like conveying voice, it's just going to plenty of AI voices I'd
rather hear.,: Instead of the president of the auto.,: Workers
union, for example. One of the things that we've seen, I think,
that's been most hopeful in this is that those who work with voice
actors already or don't want to replace voice actors, those people
who are already working in the creative sphere, who are the
producers, who are the directors, they're the people they say, I
would never replace a voice actor. But it's all of those people who
don't who have just need a voice actor for this one time, need a
voice actor for this one training video, this one thing here that
they would go to a friend or a referral or wherever it might be, to
the online casting site and cast somebody who's new. They're not
going to do that anymore. And we're not going to see it's very hard
to tangibly find the damage to this because we're not seeing
auditions going out where they're saying we're going to audition a
human versus an AI. And the AI gets the job. They're just not even
going to bother to do the auditions in the first place. And we're
never even going to know if it was a synthetic voice. So this is
partially why, again, laws and legislation. There's a Senate bill
out that NAV is endorsing senate Bill 26 91, which is a labeling
act of 2023, which is going to require all anything AI generated to
be labeled marked same thing as you would with food. I think
consumers have a right to know if what they're taking in is
synthetic or human, whether it's emotional, spiritual, food. We
have a right to know what we're interacting with. I think.,: I want
to know when I'm in the Matrix personally.,: Right, exactly. Yeah,
you want to know you're in the Matrix.,: I'm sure it puts to bed a
lot of political issues. Mean, you know, imagine sitting there
listening to a radio broadcast of Joe Biden declaring war on Russia
when it's actually not really you know, there's all sorts of issues
that this raises.,: Well, that as well, but also it raises the
possibility of doubt. And the Donald Trump tape from years ago, if
he could say, well, I never said that that's a synthetic voice, and
prove that it's not my synthetic voice. Prove I actually said that.
Right, so you're running into proving to both sides of that and
we're coming into election.,: All sorts of possibilities raised,
considering some of the possible candidates, right?,: Yes,
absolutely.,Speaker C: Is there a way of a voice actor to say,
okay, I'm going to actually upload to say someplace where you can
license a voice from you actually give them all the information of
your voice and then there's a license fee. If people want to grab
it and use it for something, then they pay you a license fee the
same as you would do with library music.,: Absolutely. I've been
pushing that example for a while. I think that one of the ways that
both Europe with the GDPR and with FTC are approaching this is that
we don't need to make new laws or new regulations. We just need to
enforce the ones that exist and put this into use. The precedent, I
think the precedent of music licensing can directly go into voice.
You have a licensing fee, you have a usage fee, you have a
generation fee. If you generate new content from this, then I get
paid a certain amount for the generation. There's companies out
there that do that. Vocal ID veritone was one of the earlier ones
that did that. And there's a licensing fee that they have in place
for that. And the actors who do that have the consent to know where
their voice goes. We're working with a TTS company who reached out
to us and we're helping them with this exact same thing of helping
to license their deployments so that the voice actor knows where
their voice is being used, but also get paid for the original
creation of that model and then know where the voice goes from
there. There's lots of possibilities. The one possibility that
unfortunately, none of those things really exist right now. The
only possibilities happen is people just can upload your voice
anywhere they want to create a synthetic voice and use it. And
there's nothing really stopping anybody, even the AI sites. Right
now, all you have to do is click a button that says, yes, I have
the right to upload this voice.,: And at what point do you stop?,:
I mean, at what point do you stop anybody?,: If you blend two
people's voices or three people's, at a certain point, you're.,:
Like, it's becomes you know I mean, that's what Siri Alexa, Google
voice, those are, you know, they're all blended voices, multiple
people put in together and to create a new voice. So now you have
to get into now you're talking about songwriting splits, right? Now
you're going to talk about splits and points on a song, right? So
I've got three voices. We all get an equal split of the usage of
that voice, or does it not become an issue because it doesn't sound
like anybody? Therefore, there's no conflict. Voice actors, you're
also going to run into conflict. Right? What if my human voice is
doing Pepsi? My synthetic voice can't do Coca Cola. And if it does,
who's going to be held responsible for or or a voice that just
sounds like me? At what point how do you draw the line there? How
do you even know this voice sounds a lot like me? Is it my voice or
is it not my voice? It's a voice that sounds a lot like me. Do I
get into conflict because of the similarity?,: It's just like this
actors are impersonated. It has to be like, all voices are
synthesized, right?,: Yeah, exactly.,: From a synthetic voice
saying that all voices are synthesized, including this voice.,:
Yeah. Right.,Speaker C: But can you see, like, if you look into the
future of the role of the agent, will the agent all of a sudden
become a library of voices that can potentially be used for AI?
Would that be the shift?,: I have honestly have no idea. I think
there's going to be a we're already starting to see a split of
human only no AI, and then those who are willing to have a
conversation with it and explore it. I'm not by any means
advocating to replace humans with AI voices, but we also know that
this technology has been around for years, right? And it's been
being built for the last 20 years, ten years solidly for synthetic
voices. It's here, and we can just pretend that it's not going to
have an impact and hope that it doesn't have an impact. Or we can
go directly to these companies, which is what we've been doing.
I've been speaking with the CEOs of these companies to try and talk
with them about great, this is why voice actors are concerned. This
is why artists in general are concerned. But this is what we're
concerned about. And we know you have a lot of money. Eleven Labs
just they're worth $100 million, or they got an investment of $100
million a month ago or so. Right. They have the money to pay the
voice actors fairly for the foundation. And if they can license
that, the better audio they have, the better foundational model
they can create. So if those voice actors who want to do that have
the right to say yes, it's the right to say yes as much as it is
the right to say no. You should have the right to say yes if you
want to. I think.,Speaker C: I reckon there's going to be a
scramble with voice actors all trying to get themselves uploaded
onto one of these business sites so they can be licensed out.,:
Yeah, some of them have. Right now, there's really no clear
understanding of what that licensing fee would be. We've seen
similar jobs on the casting sites that on one job is paying $500,
on the next job it's paying $20,000. And they don't appear to be
any different. We just don't have enough a lot of people who are
casting don't have enough information to know about where those
files are going to be used. Voice actors don't know really enough
about how they're going to be used either to know what to ask, and
agents don't know what to ask either. Like just so many unknowns
out there about what to even ask to come up with what a fair usage
would be. Because there's so many potentially so many uses out
there that we can't even comprehend right now that we can't even
imagine of that they could be used for. So it's really hard to
tell. That generation is kind of what we're looking at as kind of a
generation fee is what we're kind of really interested in.,Speaker
C: Well, it's going to be interesting to watch how this all
unfolds, but it's.,: A massive can of worms, isn't it?,Speaker C:
It is incredible.,: It is a massive can of worms. Yeah. Visual
artists are being hit massively, obviously, right now. They're some
of the most hard hit because those images are so distinctive and
the styles are so distinct that when they come out that it's
obvious it was trained on those authors. There's two lawsuits
against multiple lawsuits against AI companies right now from
authors who have had their books ingested into these and used as
foundational models to train these things. And the thing is, once
it's trained, you can't untrain it.,: Well, AP, was it? You saying
that there's a film in the Cam with starring James Dean?,Speaker C:
Yeah, that's what I'm told is sitting there waiting to go. So James
Dean is going to be a co star of a New know you've used motion
capture. So they've got an actor that actually can walk and move
like James Dean. They've just done a motion capture and then they
built James Dean over the top of his skeleton, so to speak. And if
that thing becomes a hit, you can see they're going to drag them
all out.,: Right.,: And then Elvis really isn't dead.,: Yeah,
right, exactly. We talk about that for vo. Like speech to speech,
too.,: Well, that's the thing. How would you license that, Tim?,:
It's performed know the James Dean performed by so and so. You want
to give the motion capture person the credit for it. Like speech to
could I could know. Karen Guilford vice president uses example a
lot, which is she could narrate audacity of Hope and then put
Barack Obama's voice over it. So it would be the voice of Barack
Obama performed by Karen Guilfrey.,: Right.,: So as read by Barack
Obama performed by Corn Griffin. Yeah. As puppetry.,Speaker B:
Yeah.,Speaker C: If I was the ad agency for 711, I would actually
get an AI of Elvis and have him in a 711. And finally, it's true.,:
Slurpee in one hand, donut in the other. Is that what you're
saying?,: When does Elvis become public domain?,: A long time. Long
time. It's a space to watch, isn't it? It really is.,Speaker C: And
the space will be filled by AI.,: Yeah, it's interesting. And I
think we've got three months left. I think we have about three
months before something dramatically so you think there.,: Is a
time frame on this? Because I was actually sitting here thinking,
god, how long will this take to sort? But you're saying you think
there might be a time frame on it?,: I think we have, if anything,
any legitimate and strong protections need to be in place before
the end of the year. By the end of the year, it's going to be too
late for us to have any kind of protection. The technology is
moving too quickly. It's exponential. And it's going to be beyond
our control or potentially beyond the control of those who actually
are running the systems. At one point, without fully taking your
entire system offline and destroying your models, it could
potentially get to the point where there is no control, there is no
ability to consent, there is no ability to even know whose voice is
being used. They're just a multitude of generic voices that one
company gets paid when you use their voice, but nobody has any idea
who the human behind it is or where the content came from
anymore.,: Watch this space, people.,Speaker C: Yes, indeed.
Indeed. Exactly. By the way, this is actually really not me. I'm on
holiday.,: This is my not hard to do.,Speaker B: Well, that was
fun. Is it over?,Speaker A: The Pro Audio suite with thanks to
Tribut and Austrian audio recorded using Source Connect edited by
Andrew Peters and mixed by Voodoo Radio Imaging with tech support
from George the Tech Wittam don't forget to subscribe to the show
and join in the conversation on our Facebook group. To leave a
comment, suggest a topic or just say g'day. Drop us a note at our
website. Theproudiosuite.com.