Nvidia Minitron 4B and 8B models:

- models require 40x fewer training tokens.
- they achieve a 16% performance boost on the MMLU benchmark.
- the distilled models, created through pruning, distilling, and retraining, outperform their teacher models.
- Minitron models remain competitive with models like the L3 8B and Mistral 7B, using significantly less compute and fewer training tokens.

Nvidia Minitron 4B and 8B models:

- models require 40x fewer training tokens.
- they achieve a 16% performance boost on the MMLU benchmark.
- the distilled models, created through pruning, distilling, and retraining, outperform their teacher models.
- Minitron models remain competitive with models like the L3 8B and Mistral 7B, using significantly less compute and fewer training tokens.

https://huggingface.co/collections/nvidia/minitron-669ac727dc9c86e6ab7f0f3e