0 reply
0 recast
0 reaction
Zyphra introduced Zamba2-7B, a new SLM that surpasses competitors like Mistral-7B, Gemma-7B, and Llama3-8B in both performance & efficiency. Designed for on-device use, consumer GPUs, and enterprise applications.
Zamba2-7B:
> Achieves top benchmark results with faster inference—25% quicker first token generation and 20% more tokens per second compared to similar models.
> Utilizes Mamba2 blocks w/ two shared attention blocks interleaved in an ABAB pattern, enhancing cross-sequence dependencies.
> Trained on 3T tokens, combining proprietary Zyda data w/ high-quality, deduplicated datasets, followed by a specialized annealing phase to improve model precision.
> Zamba2-7B achieves high throughput and low memory usage, eliminating the need for KV-caches & leveraging efficient Mamba blocks optimized for parallel hardware.
> Model was trained on 128 H100 GPUs over 50 days and will be available open-source under Apache 2.0. w/ integrations on Huggingface and PyTorch. 1 reply
9 recasts
50 reactions
0 reply
0 recast
3 reactions