Content
@
0 reply
0 recast
0 reaction
ππͺπΎπ‘π‘πΎ
@gm8xx8
Kyutai Labs has open-sourced Moshi, a 7.6B speech-to-speech foundation model, and Mimi, a SoTA streaming speech codec. The release includes Moshi models fine-tuned on synthetic data, along with Mimi, which processes 24 kHz audio with a bandwidth of 1.1 kbps. The models are optimized for on-device performance, with low latency and support for inference via Candle, PyTorch, and MLX. https://huggingface.co/collections/kyutai/moshi-v01-release-66eaeaf3302bef6bd9ad7acd
1 reply
0 recast
16 reactions
Jorge Pablo Franetovic π©
@jpfraneto.eth
what use cases do you envision these bringing? live translation on-site?
0 reply
0 recast
1 reaction