0 reply
0 recast
0 reaction
The Meta Llama 3.1 seriesโincluding the 405B, 70B, and 8B models.
key details:
- training: utilizes over 15.6 trillion tokens, with additional synthetic outputs.
- performance: scores of 85.2 for 405B, 79.3 for 70B, and 66.7 for 8B on the MMLU benchmarks.
- multilingual support: languages include English, French, German, Hindi, Italian, Portuguese, Spanish, and Thai.
- architecture: updated Llama architecture with RoPE.
- supports various quantization options such as AWQ, Bitsandbytes, and GPTQ to manage GPU requirements effectively.
- fp16 and static fp8 quant for 405b.
- security: includes Prompt Guard and Llama Guard 8B.
- consumed 39.3 million GPU hours in training.
- 128K context length.
- robust tool-use and agent capabilities.
- dedicated pad token.
- data quality: bad instruction samples filtered using a reward model and LLM-as-a-judge, with additional insights from Instag.
- the best open-source LLMs currently available.
https://warpcast.com/gm8xx8/0x5f9b5aeb 3 replies
1 recast
8 reactions
0 reply
0 recast
0 reaction