Uncategorized – nanonomad

Standalone Whisper XXL: The Hassle-Free Transcription Tool

Posted on February 10, 2025 (February 26, 2025) by nanonomad

I recently discovered a GitHub project which I cover in this video that’s become my go-to for transcription work – Standalone Whisper XXL by Purfview. If you’ve tried implementing OpenAI’s Whisper speech-to-text model before, you know it can get messy with dependencies, especially when using enhanced forks like faster-whisper. This project solves all those headaches […]

Stable Audio Open 1.0 | Open Source Generative Audio with Fine Tuning

Posted on June 11, 2024 (June 11, 2024) by nanonomad

A look at Stability AI’s new Stable Open Audio 1.0 open source (kinda, sorta, mostly, technically) model and codebase with fine tuning support (kinda, sorta, technically). I’ve managed to get the trainer running, but I only have a 12gb GPU, which isn’t enough for training right now. Resources: https://huggingface.co/stabilityai/stable-audio-open-1.0https://github.com/Saganaki22/StableAudioWebUI https://github.com/Stability-AI/stable-audio-tools/issues/34 Example training launch command: python […]

Bark Voice Cloning, TTS, RVC, Music Generation and More with the TTS Generation WebUI

Posted on April 2, 2024 by nanonomad

A look at the TTS Generation Web UI for Bark text to speech, generating music, translation and more. Just a broad overview of what seems to work well, and not so well in this feature-packed project. Video Link: https://www.youtube.com/watch?v=Y8J717tr9t0 Sources for RVC Models: https://rvc-models.com/ https://voice-models.com/ https://huggingface.co/spaces/zomehwh… TTS Generation WebUI: https://github.com/rsxdalv/tts-generation-webui […]

Site will be updated/worked on soon

Posted on December 23, 2023 by nanonomad

I’ve been dealing with some health issues, so the site remains unfinished. Videos are on pause for now, but I’m still working on a few things.In addition, my PC finally died from the load of training ML models nearly 24/7 for the past year. Either the CPU or MB is completely dead. No visual signs […]

Automate Image Captioning using Multimodal LLMs

Posted on November 19, 2023 (April 2, 2024) by nanonomad

Using multi-modal large language models for automated image captioning. Rich captions can be used for training Stable Diffusion Dreambooth or LoRAs. […]