DiffRhythm – Fast, Full-Length Song Generation

A look at DiffRhythm, a diffusion model for generative music. This one is FAST. I’m looking over the demos, sharing some installation notes, trying some demo generations, seeing what works and what doesn’t, and trying to make a decent sounding tune. This is not a detailed tutorial. DiffRhythm Huggingface demo – https://huggingface.co/spaces/ASLP-lab/DiffRhythm DiffRhythm demo page […]

Read More… from DiffRhythm – Fast, Full-Length Song Generation

YuE can’t have Suno.AI We have Generative Music at Home!

I recently discovered M-A-P YuE, an open-source AI music generator that creates complete songs with lyrics. While the original model demands hefty 80GB VRAM requirements, I came across a great GitHub project by Mozer called “YuE-extend”. This adds music extension support (with the -icl models) and exllamav2 quantized model loading. That makes it possible to […]

Read More… from YuE can’t have Suno.AI We have Generative Music at Home!

Training LoRAs and GLoRAs for Stable Diffusion 1.5 and XL Using the New Prodigy Optimizer

Training LoRA and GLoRA on SD 1.5 & XL with the Prodigy Optimizer using the Kohya_SS scripts.In today’s video I look at training LoRA and GLoRA adapters for Stable Diffusion 1.5 and XL using the Prodigy optimizer on a large and varied dataset made up of 16 characters. Then I show an example of how […]

Read More… from Training LoRAs and GLoRAs for Stable Diffusion 1.5 and XL Using the New Prodigy Optimizer

A look at XTTS v1 and Tools for Comparing Audio Embeddings

In this video I look at Coqui’s new XTTS v1 text to speech model, and complain about licensing. Then I look at a couple tools, pyannote and Speechbrain, and use a model to generate and compare audio embeddings. This can be used to identify mismatching audio clips in your datasets. Remove poor quality clips, and […]

Read More… from A look at XTTS v1 and Tools for Comparing Audio Embeddings

Remove Background Music and Enhance Speech with Free AI Tools | Avoid ContentID on YouTube

A look at using Ultimate Voice Remover, a free frontend for AI audio source separation models, to remove background music from TV clips and radio broadcasts. Then, using FFMpeg to separate audio tracks, as well as recombining single and multiple audio tracks back into a video using FFMpeg. Ultimate Vocal Remover GUI GitHub:https://github.com/Anjok07/ultimatevocalremovergui FFMpeg Windows […]

Read More… from Remove Background Music and Enhance Speech with Free AI Tools | Avoid ContentID on YouTube