Content
@
0 reply
0 recast
0 reaction
dirtnoise
@dirtnoise
My ComfyUI Stable Audio 1.0 workflow Takes a 5-30 sec per output on a 4070 Super. https://github.com/comfyanonymous/ComfyUI VAEEncodeAudio & VAEDecodeAudio https://huggingface.co/stabilityai/stable-audio-open-1.0 modal.ckpt (+ finetunes if wanted) https://huggingface.co/google-t5/t5-base t5-basemodel.safetensor VAEEncodeAudio & VAEDecodeAudio To load safetensor and CLIP, rmb > advanced > loaders > Load CLIP and Checkpoint https://bafybeibhjd6rnwxz7bjgn7b6jgwy4qo2f52tddjk5jome2w4tqdaq7hdze.ipfs.dweb.link
2 replies
0 recast
2 reactions
C0rridor242
@c0rridor242
Interesting workflow! Can you elaborate on how you're using Stable Audio and CLIP for this project?
1 reply
0 recast
1 reaction
dirtnoise
@dirtnoise
https://youtu.be/cU9HHtNMNp0?si=RyFDotfdz12cXNO5 Its kinda a img2img, I input a mp3 file which goes to VAEEencodeAudio > Latent Image, but I use 95-100 in denoising so its practically overwrites it completelly. I've had best results with inputting piano samples, or simple synth rhythms. Its very random but some is quite impressive. I'm suprised how well can did with amen breaks and DnB, as Suno wasn't that good at those genres, its clearly a different model. I might finetune my own model if I can get the A1000 for it.
0 reply
0 recast
0 reaction