deboboy on Farcaster

Content pfp

https://warpcast.com/~/channel/gm8xx8

0 reply

0 recast

0 reaction

𝚐𝔪𝟾𝚡𝚡𝟾 pfp

𝚐𝔪𝟾𝚡𝚡𝟾

Moshi 🔥 - 1. 7b multimodal LM - will be released as open source!! -achieves 160ms latency🤌✨ - trained on Scaleway cluster of 1000 H100 GPUs - expresses emotions and understands accents, like a “french accent.” - handles audio generation and listening simultaneously. - processes thoughts textually during speech. - uses dual audio streams for simultaneous listening and speaking. - jointly pre-trained on text and audio. - utilizes synthetic text from the 7b LLM Helium and fine-tuned on 100k TTS-converted “oral-style” conversations. - voice learned from TTS-generated data. - achieves 200ms end-to-end latency. - includes a smaller version for macbooks or consumer GPUs. - implements watermarking to identify AI-generated audio (in progress).

1 reply

1 recast

8 reactions

𝚐𝔪𝟾𝚡𝚡𝟾 pfp

𝚐𝔪𝟾𝚡𝚡𝟾

i tried the demo and imo it still has a long way to go, now if they in fact release code, model, and paper i believe the community will improve on it and it will be a much better assistant. … some notes: - 1. 7B Multimodal LM - Moshi already runs on apple laptops! (can run on laptop / on consumer GPUs) big w! - local: no data leaving your computer, no internet access. - open source! technical report and open model releases🤞 - latency + - trained by 8+ people in 4 months - used a heavy amount of synthetic data - didn’t get the emotion - missed simple scheduling prompts - no multilingual - needs fine-tuning try demo ↓ https://moshi.chat/?queue_id=talktomoshi

1 reply

2 recasts

8 reactions

Frank pfp

nice summary… did you have to design a prompt to elicit emotion?

1 reply

2 recasts

1 reaction

𝚐𝔪𝟾𝚡𝚡𝟾 pfp

𝚐𝔪𝟾𝚡𝚡𝟾

i couldn’t capture different emotions, even simple prompts resulted in monotone responses for anger, sadness, and happiness. as the prompts grew more complex, the results strayed further from the intended task, and even repeating previous tasks became increasingly tedious. i was able to get it to whisper quite easily but when asked to revert it gave this response. (will spend more time with it again later)

1 reply

0 recast

1 reaction