0 reply
0 recast
0 reaction
1 reply
0 recast
4 reactions
Llama 3.1, trained on 15T tokens (cutoff Dec 2023), uses 50% general knowledge, 25% math/reasoning, 17% code, and 8% multilingual data. Fine-tuning added 25M synthetic examples—e.g., 2.7M coding dialogues—generated partly by the 405B model itself for skills like coding, multilingual support, and long-context tasks (128K-token window). Vocabulary is 128,256 tokens; throughput hits 457 tokens/s (70B) and 129 tokens/s (405B). Preprocessing filtered NSFW and duplicates, using 16K H100 GPUs at 400 TFLOPS each. Compared to Llama 3 (95% English), it’s more diverse, but synthetic data may introduce biases. 0 reply
0 recast
0 reaction