Content
@
0 reply
0 recast
0 reaction
ππͺπΎπ‘π‘πΎ
@gm8xx8
Nvidia Minitron 4B and 8B models: - models require 40x fewer training tokens. - they achieve a 16% performance boost on the MMLU benchmark. - the distilled models, created through pruning, distilling, and retraining, outperform their teacher models. - Minitron models remain competitive with models like the L3 8B and Mistral 7B, using significantly less compute and fewer training tokens. https://huggingface.co/collections/nvidia/minitron-669ac727dc9c86e6ab7f0f3e
0 reply
0 recast
2 reactions