Even more Voice Cloning | Train a Multi-Speaker VITS model using Google Colab and a Custom Dataset

I’ve been looking at multispeaker VITS TTS models lately, so thought I’d share the Google Colab notebook. Its similar to the others posted, but this is using precomputed vectors; the configuration is similar to the YourTTS model, however this seems a little easier to fine tune. As always, this stuff is experimental, but this should help you get started if you want to poke around at training a multi-speaker, English language VITS model using the Coqui TTS framework.

Multi-Speaker English language VITS training Colab Notebook: https://colab.research.google.com/drive/1wAuG-TcZeAUYhff0f6ZiG-so9KT-sBIE?usp=sharing

YourTTS video discussing the same training options that can be used here as well: https://www.youtube.com/watch?v=1yt2W-uK8mk

Real time noise suppression plugin: https://github.com/werman/noise-suppression-for-voice

Audacity: https://www.audacityteam.org/

Coqui’s Dataset Guide: https://github.com/coqui-ai/TTS/wiki/What-makes-a-good-TTS-dataset

rnnoise: https://github.com/xiph/rnnoise

Download my multilingual, multispeaker YourTTS model on

Huggingface: https://huggingface.co/AOLCDROM/YourTTS-Fr-En-De-Es

See allvoices.txt for information about each speaker:language training pair. Was trained on character sets, and uses ‘artificial’ language codes.

Generate text with the CLI:

tts --text "text" --out_path outfile.wav --model_path path/to/model_file.pth --config_path path/to/config.json --speakers_file_path speakers/index/path/speakers.pth --speaker_idx VCTK_speaker

Leave a Reply Cancel reply