Content pfp
Content
@
0 reply
0 recast
0 reaction

𝚐𝔪𝟾𝚡𝚡𝟾 pfp
𝚐𝔪𝟾𝚡𝚡𝟾
@gm8xx8
Zyphra introduced Zamba2-7B, a new SLM that surpasses competitors like Mistral-7B, Gemma-7B, and Llama3-8B in both performance & efficiency. Designed for on-device use, consumer GPUs, and enterprise applications. Zamba2-7B: > Achieves top benchmark results with faster inference—25% quicker first token generation and 20% more tokens per second compared to similar models. > Utilizes Mamba2 blocks w/ two shared attention blocks interleaved in an ABAB pattern, enhancing cross-sequence dependencies. > Trained on 3T tokens, combining proprietary Zyda data w/ high-quality, deduplicated datasets, followed by a specialized annealing phase to improve model precision. > Zamba2-7B achieves high throughput and low memory usage, eliminating the need for KV-caches & leveraging efficient Mamba blocks optimized for parallel hardware. > Model was trained on 128 H100 GPUs over 50 days and will be available open-source under Apache 2.0. w/ integrations on Huggingface and PyTorch.
1 reply
9 recasts
50 reactions

𝚐𝔪𝟾𝚡𝚡𝟾 pfp
𝚐𝔪𝟾𝚡𝚡𝟾
@gm8xx8
Zamba2-7B ☺︎ Blog: https://www.zyphra.com/post/zamba2-7b Weights: https://huggingface.co/Zyphra/Zamba2-7B Instruct: https://huggingface.co/Zyphra/Zamba2-7B-Instruct Chat with the model: https://maia.zyphra.com/chat Demo: https://huggingface.co/spaces/Zyphra/Zamba2-7B NVIDIA’s NIM: https://build.nvidia.com/zyphra/zamba2-7b-instruct
0 reply
0 recast
3 reactions