XTTSv2 Hindi Finetuned Checkpoints For use in most implementations of XTTSv2, these must be renamed to model.pth and replace the original XTTSv2 checkpoint. https://huggingface.co/AOLCDROM/XTTSv2-Hi_ft/tree/main Indic TTS Hindi Dataset https://www.iitm.ac.in/donlab/indictts/database Common Voice Dataset https://commonvoice.mozilla.org/en/datasets Convert Mozilla Common Voice .TSV to VCTK format dataset metadata conv_cv_vctk.py Download and install ffmpeg, and add it to your windows system […]
Tag: Coqui TTS
A look at XTTS v1 and Tools for Comparing Audio Embeddings
In this video I look at Coqui’s new XTTS v1 text to speech model, and complain about licensing. Then I look at a couple tools, pyannote and Speechbrain, and use a model to generate and compare audio embeddings. This can be used to identify mismatching audio clips in your datasets. Remove poor quality clips, and […]
Read More… from A look at XTTS v1 and Tools for Comparing Audio Embeddings
Even more Voice Cloning | Train a Multi-Speaker VITS model using Google Colab and a Custom Dataset
I’ve been looking at multispeaker VITS TTS models lately, so thought I’d share the Google Colab notebook. Its similar to the others posted, but this is using precomputed vectors; the configuration is similar to the YourTTS model, however this seems a little easier to fine tune. As always, this stuff is experimental, but this should […]
Updated | Near-Automated Voice Cloning | Whisper STT + Coqui TTS | Fine Tune a VITS Model on Colab
This is about as close to automated as I can make things. I’ve put together a Colab notebook that uses a bunch of spaghetti code, rnnoise, OpenAI’s Whisper Speech to Text, and Coqui Text to Speech to train a VITS model. Upload audio files, split and process clips, denoise clips, transcribe clips with Whisper, then […]