Content pfp
Content
@
0 reply
0 recast
0 reaction

𝚐π”ͺ𝟾𝚑𝚑𝟾 pfp
𝚐π”ͺ𝟾𝚑𝚑𝟾
@gm8xx8
340b parameters, 4k context window: - models: base, reward, instruct - dual-phase training on 9 trillion tokens - supports english, 50+ other languages, 40+ programming languages - operational requirements: 16 H100 GPUs in bf16, 8 in int4 - performance: 81.1 MMLU; 90.53 HellaSwag; 85.44 BHH - techniques: SFT, DPO, RPO - squared ReLU unlike Llama SwiGLU, Gemma GeGLU (models and technical report below) https://huggingface.co/collections/nvidia/nemotron-4-340b-666b7ebaf1b3867caf2f1911 https://research.nvidia.com/publication/2024-06_nemotron-4-340b
0 reply
2 recasts
22 reactions