Content pfp
Content
@
0 reply
0 recast
0 reaction

𝚐π”ͺ𝟾𝚑𝚑𝟾 pfp
𝚐π”ͺ𝟾𝚑𝚑𝟾
@gm8xx8
Nvidia Minitron 4B and 8B models: - models require 40x fewer training tokens. - they achieve a 16% performance boost on the MMLU benchmark. - the distilled models, created through pruning, distilling, and retraining, outperform their teacher models. - Minitron models remain competitive with models like the L3 8B and Mistral 7B, using significantly less compute and fewer training tokens. https://huggingface.co/collections/nvidia/minitron-669ac727dc9c86e6ab7f0f3e
0 reply
0 recast
2 reactions

TuanNguyen pfp
TuanNguyen
@tuannguyen93
Ohhh
0 reply
0 recast
0 reaction