tts – nanonomad

Fine Tuning XTTS V2 with (forked) Coqui

Posted on June 26, 2024 (June 28, 2024) by nanonomad

Fine tuning XTTS v2 with the forked Coqui project. Coqui AI shut down earlier this year, so what does that mean for us? Here I go over adjusting the Coqui XTTS v2 training recipe, creating a dataset using Audacity and faster-whisper, and training a single speaker and multispeaker XTTS v2 english model Convert wavs to […]

Are Text Cleaners Making Your TTS Models Sound Bad? | TTS Model Training Tips

Posted on July 30, 2023 (August 29, 2023) by nanonomad

In this video, I look at text cleaners, and how they could be potentially causing issues with training your TTS models. I refer to cleaners in the Tortoise TTS AI Voice Cloning WebUI (MRQ) and Coqui TTS Messy and unfinished LJSpeech-format dataset markup/processing script: […]

Even more Voice Cloning | Train a Multi-Speaker VITS model using Google Colab and a Custom Dataset

Posted on February 12, 2023 (August 28, 2023) by nanonomad

I’ve been looking at multispeaker VITS TTS models lately, so thought I’d share the Google Colab notebook. Its similar to the others posted, but this is using precomputed vectors; the configuration is similar to the YourTTS model, however this seems a little easier to fine tune. As always, this stuff is experimental, but this should […]

Updated | Near-Automated Voice Cloning | Whisper STT + Coqui TTS | Fine Tune a VITS Model on Colab

Posted on February 3, 2023 (August 28, 2023) by nanonomad

This is about as close to automated as I can make things. I’ve put together a Colab notebook that uses a bunch of spaghetti code, rnnoise, OpenAI’s Whisper Speech to Text, and Coqui Text to Speech to train a VITS model. Upload audio files, split and process clips, denoise clips, transcribe clips with Whisper, then […]