AI Audio Tools – nanonomad

DiffRhythm – Fast, Full-Length Song Generation

Posted on April 17, 2025 (April 17, 2025) by nanonomad

A look at DiffRhythm, a diffusion model for generative music. This one is FAST. I’m looking over the demos, sharing some installation notes, trying some demo generations, seeing what works and what doesn’t, and trying to make a decent sounding tune. This is not a detailed tutorial. DiffRhythm Huggingface demo – https://huggingface.co/spaces/ASLP-lab/DiffRhythm DiffRhythm demo page […]

YuE can’t have Suno.AI We have Generative Music at Home!

Posted on February 25, 2025 (February 26, 2025) by nanonomad

I recently discovered M-A-P YuE, an open-source AI music generator that creates complete songs with lyrics. While the original model demands hefty 80GB VRAM requirements, I came across a great GitHub project by Mozer called “YuE-extend”. This adds music extension support (with the -icl models) and exllamav2 quantized model loading. That makes it possible to […]

Training LoRAs and GLoRAs for Stable Diffusion 1.5 and XL Using the New Prodigy Optimizer

Posted on January 18, 2024 (April 2, 2024) by nanonomad

Training LoRA and GLoRA on SD 1.5 & XL with the Prodigy Optimizer using the Kohya_SS scripts.In today’s video I look at training LoRA and GLoRA adapters for Stable Diffusion 1.5 and XL using the Prodigy optimizer on a large and varied dataset made up of 16 characters. Then I show an example of how […]

A look at XTTS v1 and Tools for Comparing Audio Embeddings

Posted on September 21, 2023 (April 2, 2024) by nanonomad

In this video I look at Coqui’s new XTTS v1 text to speech model, and complain about licensing. Then I look at a couple tools, pyannote and Speechbrain, and use a model to generate and compare audio embeddings. This can be used to identify mismatching audio clips in your datasets. Remove poor quality clips, and […]

Remove Background Music and Enhance Speech with Free AI Tools | Avoid ContentID on YouTube

Posted on August 8, 2023 (August 28, 2023) by nanonomad

A look at using Ultimate Voice Remover, a free frontend for AI audio source separation models, to remove background music from TV clips and radio broadcasts. Then, using FFMpeg to separate audio tracks, as well as recombining single and multiple audio tracks back into a video using FFMpeg. Ultimate Vocal Remover GUI GitHub:https://github.com/Anjok07/ultimatevocalremovergui FFMpeg Windows […]