There are no bad voices, just untrained ones.

“There are no bad voices, just untrained ones.” – Ben Said from the Gam1ng Café

It was a nostalgic feeling to set foot in a gaming café as an adult and request the computer for a day, even if my plans differed than anything I could have imagined as a kid. Unfortunately, my laptop was not made with a graphics card strong enough to handle the training of a machine learning model, so I was grateful the institute of my FIFA gaming youth was able to step in and approve my off-brand request.

“So you want to make a model of your voice… and then have it… sing? Why?” – I was asked.

“Because I can’t sing. Want to hear?”

My sure-fire ticket was an unconventional one: me singing a clearly very funny and embarrassing version of a song I wrote inspired by Walid Sahrij’s Haliana (click to hear the classic original, or below to see my rendition).

Samnanai:

Like you, he agreed that it needs all the help it can get and gave me the best computer available.

Now, I can follow the online cooking recipe blueprint and continue typing a wall of text about how it was done, what I feel the implications are, but let’s skip ahead to the final result so you can see if it’s worth it, only noting 2 things:

  1. This was done over the course of a day of tweaking parameters. I did not do any post-mixing.
  2. I trained the model using only the above sample of me singing in Arabic, as well as 10 minutes of me speaking in Arabic Should you feel the difference is significant enough

With the above Arabic training set, I was able to voice clone my voice into the pitch and rhythms of not only Walid in Haliana, but also have it render a Spanish song – La Bachata by Manuel Turoizo.

Arabic Haliana:

Spanish La Bachata

Is it perfect? Certainly not. Is it a huge step up from what I can do while speaking an entirely different language it was not trained on?

It sure as Ice-Cube’s hell is.

So why do I think this is important?

Firstly, we are one step closer to expressing a song I wrote in the way I imagined it (with the model I have, I cannot adapt new lyrics, I can just change the voice of the singer of a song). Whether or not that is a net positive or net negative to your ears is up for debate, but what is not is that by extension, we are also one step closer to many others expressing songs they have written.

And it is that verb, express, that I want to focus on. Much of the conversation of A.I. in music has been focused on copyright and monetization. Partly for good reason – I think artists should continue to make a living and thrive as they bring us all invaluable work.

But the rest of us have different reasons to create and express. It is almost cliché to say that in the purist forms, artists would create for free. Art is freeing, it surfaces perspectives and feelings we may not known we harbor, it allows us to safely entertain the other side or story without actually living it, it allows us to be enveloped in a project with an end result at the end, it is cathartic, and there are even evidence-based mental health therapies created around the modality. That is just listing a few of it’s many advantages.

In a eutopia where value generated by A.I. is eventually distributed to create a non capitalistic nature of day-to-day life, this will prove crystal clear. People will create anyway, that I have no second thoughts about.

But let us assume that we won’t get to a eutopia for a long time (not a far leap). I wager that many people dabbling in mediums of art they are not trained in would love to express their work in an A.I. assisted way with full knowledge that it eliminates (or reduces) their compensation mechanics.  

I similarly think the fact that they can express with an A.I. assisted way will allow for more people to realize that if trained classically, they can actually be artists and create in an unassisted way (or in whatever level of assisted frameworks we agree collectively can be monetizeable.) In the most literal terms, people will start to hear themselves “doing it,” and therefore, do it.

We are currently a civilization clearly lacking confidence.

“Just do it.” Is the affirmation required to run a marathon.

“Yes, we can.” For the love of God, that drove a presidential campaign.

3 very empowering words, don’t get me wrong. But maybe a presidential nominee can run a campaign around real issues of what it is “we can” and “should do”, instead of reminding us that we could do it. Afterall, if we hear ourselves singing in a language we’re not fluent in, maybe we won’t need reminders of how powerful we are.

Maybe then, we can work to agree on what it is we are going to do, rather than be driven to polarization when we are all muddy actually “doing it” and realizing we don’t quite agree on “what to do.”

In the coming years, the world is clearly going to change in a major way. There will be a lot of issues and risks we have to work through and around. I propose that to reach the rewards that make those types of risks worthwhile, we all need the best versions of ourselves to be at the forefront.

Helping people understand that “yes, they can” in one part of their lives will help them believe that “yes, they can” in all the rest.

And maybe then, “we can” all drive towards the best future life possible, both individually and collectively.

Note: Please click here to follow the Retrieval-based-Voice-Conversion-WebUI model I used to create the above renditions.