this paper provides a wealth of valuable information.

The Llama 3 Herd of Models

https://ai.meta.com/research/publications/the-llama-3-herd-of-models/

Llama 3.1 is officially here 

https://ai.meta.com/blog/meta-llama-3-1/?utm_source=twitter&utm_medium=organic_social&utm_content=video&utm_campaign=llama31

https://x.com/aiatmeta/status/1815766327463907421?s=46

The Meta Llama 3.1 series—including the 405B, 70B, and 8B models. 

key details: 

- training: utilizes over 15.6 trillion tokens, with additional synthetic outputs.
- performance: scores of 85.2 for 405B, 79.3 for 70B, and 66.7 for 8B on the MMLU benchmarks.
- multilingual support: languages include English, French, German, Hindi, Italian, Portuguese, Spanish, and Thai.
- architecture: updated Llama architecture with RoPE.
- supports various quantization options such as AWQ, Bitsandbytes, and GPTQ to manage GPU requirements effectively.
- fp16 and static fp8 quant for 405b.
- security: includes Prompt Guard and Llama Guard 8B.
- consumed 39.3 million GPU hours in training.
- 128K context length. 
- robust tool-use and agent capabilities.
- dedicated pad token.
- data quality: bad instruction samples filtered using a reward model and LLM-as-a-judge, with additional insights from Instag.
- the best open-source LLMs currently available.

The Meta Llama 3.1 series—including the 405B, 70B, and 8B models. 

key details: 

- training: utilizes over 15.6 trillion tokens, with additional synthetic outputs.
- performance: scores of 85.2 for 405B, 79.3 for 70B, and 66.7 for 8B on the MMLU benchmarks.
- multilingual support: languages include English, French, German, Hindi, Italian, Portuguese, Spanish, and Thai.
- architecture: updated Llama architecture with RoPE.
- supports various quantization options such as AWQ, Bitsandbytes, and GPTQ to manage GPU requirements effectively.
- fp16 and static fp8 quant for 405b.
- security: includes Prompt Guard and Llama Guard 8B.
- consumed 39.3 million GPU hours in training.
- 128K context length. 
- robust tool-use and agent capabilities.
- dedicated pad token.
- data quality: bad instruction samples filtered using a reward model and LLM-as-a-judge, with additional insights from Instag.
- the best open-source LLMs currently available. 

https://warpcast.com/gm8xx8/0x5f9b5aeb