A look at XTTS v1 and Tools for Comparing Audio Embeddings

In this video I look at Coqui’s new XTTS v1 text to speech model, and complain about licensing. Then I look at a couple tools, pyannote and Speechbrain, and use a model to generate and compare audio embeddings. This can be used to identify mismatching audio clips in your datasets. Remove poor quality clips, and […]

Read More… from A look at XTTS v1 and Tools for Comparing Audio Embeddings

Are Text Cleaners Making Your TTS Models Sound Bad? | TTS Model Training Tips

In this video, I look at text cleaners, and how they could be potentially causing issues with training your TTS models. I refer to cleaners in the Tortoise TTS AI Voice Cloning WebUI (MRQ) and Coqui TTS Messy and unfinished LJSpeech-format dataset markup/processing script: […]

Read More… from Are Text Cleaners Making Your TTS Models Sound Bad? | TTS Model Training Tips

.:Demo:. Tortoise TTS Expressive Speech narrating Norman Arkawy’s 1955 Sci-Fi short “Selling Point”

Narration of the short story ‘Selling Point’ by Norman Arkawy using a Tortoise TTS model generating a familiar-sounding, expressive, British voice. Originally published in ‘Imagination Stories of Science and Fantasy’, December 1955.One of my favorite short stories. Full story text:https://www.gutenberg.org/cache/epub/66713/pg66713.txt Training: 10 epochs total. Epochs 1-4 LR 1e-5, 5-6 LR 1e-6, 7-10 LR 1e-7 Mel/Text: […]

Read More… from .:Demo:. Tortoise TTS Expressive Speech narrating Norman Arkawy’s 1955 Sci-Fi short “Selling Point”

.::Demo::. 4 Voice Multispeaker Tortoise TTS English Fine-Tuned Model Test :: Great Dictator Speech

First test of the new Tortoise model. 4 voices, which also can be found in the YourTTS model I posted recently. LJS, John, and Tom rendered without any stammers or repeats, Lah has some stutters if I recall. No cherry-picked examples. Gen settings same as in my Tortoise fine tuning video, except denoise set to […]

Read More… from .::Demo::. 4 Voice Multispeaker Tortoise TTS English Fine-Tuned Model Test :: Great Dictator Speech

Even more Voice Cloning | Train a Multi-Speaker VITS model using Google Colab and a Custom Dataset

I’ve been looking at multispeaker VITS TTS models lately, so thought I’d share the Google Colab notebook. Its similar to the others posted, but this is using precomputed vectors; the configuration is similar to the YourTTS model, however this seems a little easier to fine tune. As always, this stuff is experimental, but this should […]

Read More… from Even more Voice Cloning | Train a Multi-Speaker VITS model using Google Colab and a Custom Dataset

Updated | Near-Automated Voice Cloning | Whisper STT + Coqui TTS | Fine Tune a VITS Model on Colab

This is about as close to automated as I can make things. I’ve put together a Colab notebook that uses a bunch of spaghetti code, rnnoise, OpenAI’s Whisper Speech to Text, and Coqui Text to Speech to train a VITS model. Upload audio files, split and process clips, denoise clips, transcribe clips with Whisper, then […]

Read More… from Updated | Near-Automated Voice Cloning | Whisper STT + Coqui TTS | Fine Tune a VITS Model on Colab