Stable Audio Open 1.0 | Open Source Generative Audio with Fine Tuning

A look at Stability AI’s new Stable Open Audio 1.0 open source (kinda, sorta, mostly, technically) model and codebase with fine tuning support (kinda, sorta, technically). I’ve managed to get the trainer running, but I only have a 12gb GPU, which isn’t enough for training right now. Resources: https://huggingface.co/stabilityai/stable-audio-open-1.0https://github.com/Saganaki22/StableAudioWebUI https://github.com/Stability-AI/stable-audio-tools/issues/34 Example training launch command: python […]

Read More… from Stable Audio Open 1.0 | Open Source Generative Audio with Fine Tuning

Bark Voice Cloning, TTS, RVC, Music Generation and More with the TTS Generation WebUI

A look at the TTS Generation Web UI for Bark text to speech, generating music, translation and more. Just a broad overview of what seems to work well, and not so well in this feature-packed project. Video Link: https://www.youtube.com/watch?v=Y8J717tr9t0 Sources for RVC Models: https://rvc-models.com/ https://voice-models.com/ https://huggingface.co/spaces/zomehwh… TTS Generation WebUI: https://github.com/rsxdalv/tts-generation-webui […]

Read More… from Bark Voice Cloning, TTS, RVC, Music Generation and More with the TTS Generation WebUI

Site will be updated/worked on soon

I’ve been dealing with some health issues, so the site remains unfinished. Videos are on pause for now, but I’m still working on a few things.In addition, my PC finally died from the load of training ML models nearly 24/7 for the past year. Either the CPU or MB is completely dead. No visual signs […]

Read More… from Site will be updated/worked on soon

Automate Image Captioning using Multimodal LLMs

Using multi-modal large language models for automated image captioning. Rich captions can be used for training Stable Diffusion Dreambooth or LoRAs. […]

Read More… from Automate Image Captioning using Multimodal LLMs