Content pfp
Content
@
0 reply
0 recast
0 reaction

Katsuya pfp
Katsuya
@kn
Anyone know open source models or papers for splitting audio (that contains human speech and other sounds) into two separate audio files, one containing human speech only and one containing other sounds?
5 replies
1 recast
3 reactions

Giuliano Giacaglia pfp
Giuliano Giacaglia
@giu
I’ve tried to find a good tool for this to help with speech synthesis to get the human speech track but didn’t find anything that worked well
1 reply
0 recast
1 reaction

Katsuya pfp
Katsuya
@kn
Interesting, it seems doable and dataset is relatively easy to synthesize.
1 reply
0 recast
0 reaction

Giuliano Giacaglia pfp
Giuliano Giacaglia
@giu
The problem is the dataset, but I agree once you have the dataset it shouldn’t be hard to do it
2 replies
0 recast
0 reaction

Giuliano Giacaglia pfp
Giuliano Giacaglia
@giu
Also that’s why speech synthesis is not really great with every single figure. For now, you need enough clean data with someone’s voice. Joe Rogan, for example, is a great subject for this
0 reply
0 recast
0 reaction

Katsuya pfp
Katsuya
@kn
Yeah I don’t have good intuition for the size of dataset required but my guess is a lot less than TTS, maybe similar to ASR given one-to-many problem, so was thinking there is enough public speech dataset (>10k hrs) plus non-speech dataset which can be mixed together to synthesize for training.
1 reply
0 recast
0 reaction