jtgi pfp
jtgi
@jtgi
Evolving my voice playground to have good voice activity detection, aka vad. I was using a basic heuristic, "was quiet for 500ms" but it's too aggressive and cuts you off while you're thinking. That really bothers me about openai's implementation. The proper way is a model that understands what you're saying and if you finished a complete thought, aka semantics. So I'm experimenting with that but the catch is it has to be low latency (< 500ms). So I found a couple lightweight models that can run in the browser like Silero, it compresses down to 2mb, you can run it in via the ONNX runtime.
1 reply
2 recasts
18 reactions

erica pfp
erica
@erica
that is also one of my biggest frustrations with the openAI voice mode also, i get so angry when it cuts me off haha curious to see if this works out!
0 reply
0 recast
1 reaction