Since the recent announcements of OpenView’s ChatGPT, Google’s Bard, and Baidu’s ChatBot, the industry has been in a frenzy advancing Generative AI products and solutions. Brainy Insights estimates that the generative AI market will grow from USD $8.65 billion in 2022 and reach USD 4188.62 billion by 2032. This translates to over 36% CAGR making generative AI one of the next hottest areas to elevate AI innovations. The software segment will account for the highest revenue share of 65.0% in 2021 and is expected to retain its position over the forecast period.
What is Generative AI?
Generative AI is a form of AI that produce various types of content including text, imagery, audio and synthetic data. The recent buzz around generative AI has been driven by the simplicity of new user interfaces for creating high-quality text, graphics and videos in a matter of seconds. Although not a new technology, the introduction of generative adversarial networks, or GANs which is a type of machine learning algorithm has advanced the innovations in using this form of AI.
COQUI – Generative AI will Revolutionize Voice
The exciting news is that former Mozillians have just raised $3.3M for Coqui, generative AI speech synthesis for all creatives. Prior to founding COQUI, the CEO Kelly Davis led the Mozilla Machine Learning Group, which focused on speech technology. Before that, he worked at the Max Plank Institute for Gravitational Physics and also did his Ph.D. work in Superstring Theory.
The company was founded in 2021 by Eren Gölge, Josh Meyer, Kelly Davis, and Reuben Morais, all whom worked at Mozilla’s machine learning group. Funding has come from leading players: ScaleX Ventures, Mango Capital, DNX Ventures, and angels. At Mozilla, they spent years working on speech technology but found traditional approaches to creating and controlling voices, at best, lacking and, at worst non-existent.
The Coqui founders have a bold strategy to provide generative AI voices for video game developers, audio post-production, and all creatives. When I asked Kelly, what his vision of the company was, he said in a few words, simply: Coqui wants to be Photoshop for Voice.
A bold vision but with what they have already germinated is very powerful as Coqui enables creatives to quickly and easily create, cast, and direct AI voice actors without all the overhead hassle. Users can easily create custom voices from a prompt, e.g., “Old man who smokes two packs a day”; cast out-of-the-box and custom voices in your projects; and their software directs every nuance of their performance. Coqui’s AI voices not only will save time, money, and headaches, drastically decreasing the time spent casting in the recording studio and also in post-production.
“We started Coqui because, using traditional approaches, we were spending months gathering custom voice data, weeks training custom voice models, and still found it impossible to direct every nuance of a voice’s performance. It was frustrating! There had to be a better way,” said co-founder and CEO Kelly Davis. “Later, we realized that everyone had the same problem! So, we rolled up our sleeves and got to work on a solution.”
For creatives, voice is a double-edged sword.
With the slightest shift in tone, it can paint the most detailed picture of our inner lives; however, it’s a nightmare to work with. Casting, recording, directing, scheduling, booking a studio, and doing it all again in post-production. Creatives crave a simple solution, and Coqui scratches that itch. Coqui provides high-quality, out-of-the-box AI voices; quick voice cloning; prompt-to-voice; and the ability to direct every nuance of a voice’s performance. It’s a single place for casting, recording, directing, and scheduling. Everything, all at your fingertips and all at the time and place of your choosing.
“After chatting to tons of creatives working on video games, audio post-production, dubbing, and lots of other disciplines, we know that the standard manta of casting, recording, directing, scheduling… is slowing development and costing time and money. Voice needs to be dragged into the 21st century, and generative AI is doing it,” says Kelly Davis, Co-Founder, and CEO of Coqui and previous Head of Mozilla’s Machine Learning Group.
The funding will be used to grow the sales and development teams and to accelerate growth in the US market.
The voice industry revolution is everywhere, and it’s a massive opportunity to lower production costs, accelerate development, and simply iterate faster. Coqui is bringing this revolution to voice. With high-quality, out-of-the-box AI voices; quick voice cloning; prompt-to-voice; and the ability to direct every nuance of a voice’s performance, Coqui is your on-ramp to voice’s generative revolution.
There is no question the voice revolution is underway and players like Coqui, although entering later that other industry players, like Altered AI, which provides speech-to-speech technology, Replica AI which provides game engine integration or Spotify, which recently acquired Sonantic also provides natural-sounding voices.
What stands out about Coqui is the founder’s depth of expertise in the voice and AI/ML field. Having such a tightly unified co-founding team gives them a glue edge that will hold them in good stead as they advance into the voice industry which requires major productivity (workflow process streamlining) improvements.
Roger Love, one of the most iconic voice leaders in the world (ie: trained Bradley Cooper to sing in A Star is Born, and helped Jeff Bridges win an academy win for his singing voice in Crazy Heart) is the CEO and co-founder of Emotional Cloud, a company using generative AI to enable more accurate man and machine and vice versa have a more emotionally relevant conversation. He is at the forefront of understanding voice cloning and understands that without the depth of emotional accuracy, these AI methods won’t truly advance human civilization, rather we could be at risk of eroding what is uniquely human.
Positive signs are that Coqui is paying special attention to emotional variance and valence in voice patterns.
That being said, there are still major risks for these types of disruptive voice innovations will impact jobs for voice actors, and other creatives. Yes there will be a greater efficiency for reducing costs and the voice industry is in need for a massive overhaul in multiple creatives world, from text to graphics, to video and voice – but there will also be an imbalance, unless we carefully ensure more social responsibility and industry transformation thoughtfulness.
This is not a new reality of disruptive innovations, but it is an area where increased ethical and responsible AI regulatory controls will be needed to ensure social responsibility is continually factored into all AI Industries.
Innovations like Coqui are creating sound waves and their efforts will no doubt leap frog ahead other industry players.
For additional insights on AI impacts in the music industry, see Dr. Cindy Gordon articles below.
- Brainy Insights. Generative AI Market Growth Report
- Brooking Institute Research. Early Thoughts on Regulating Generative AI
- Coqui WebSite.
- Gordon, Cindy. Forbes Thought Leader Articles. AI impact on the Music Industry Article One, and Article Two.
- Lawton, George. Everything you wanted to know about Generative AI, Tech Target
Follow me on Twitter or LinkedIn. Check out my website or some of my other work here.