340b parameters, 4k context window:

- models: base, reward, instruct
- dual-phase training on 9 trillion tokens
- supports english, 50+ other languages, 40+ programming languages
- operational requirements: 16 H100 GPUs in bf16, 8 in int4
- performance: 81.1 MMLU; 90.53 HellaSwag; 85.44 BHH
- techniques: SFT, DPO, RPO
- squared ReLU unlike Llama SwiGLU, Gemma GeGLU

(models and technical report below)

340b parameters, 4k context window:

- models: base, reward, instruct
- dual-phase training on 9 trillion tokens
- supports english, 50+ other languages, 40+ programming languages
- operational requirements: 16 H100 GPUs in bf16, 8 in int4
- performance: 81.1 MMLU; 90.53 HellaSwag; 85.44 BHH
- techniques: SFT, DPO, RPO
- squared ReLU unlike Llama SwiGLU, Gemma GeGLU

(models and technical report below) 

https://huggingface.co/collections/nvidia/nemotron-4-340b-666b7ebaf1b3867caf2f1911

https://research.nvidia.com/publication/2024-06_nemotron-4-340b