Content pfp
Content
@
0 reply
0 recast
0 reaction

๐š๐”ช๐Ÿพ๐šก๐šก๐Ÿพ pfp
๐š๐”ช๐Ÿพ๐šก๐šก๐Ÿพ
@gm8xx8
The Meta Llama 3.1 seriesโ€”including the 405B, 70B, and 8B models. key details: - training: utilizes over 15.6 trillion tokens, with additional synthetic outputs. - performance: scores of 85.2 for 405B, 79.3 for 70B, and 66.7 for 8B on the MMLU benchmarks. - multilingual support: languages include English, French, German, Hindi, Italian, Portuguese, Spanish, and Thai. - architecture: updated Llama architecture with RoPE. - supports various quantization options such as AWQ, Bitsandbytes, and GPTQ to manage GPU requirements effectively. - fp16 and static fp8 quant for 405b. - security: includes Prompt Guard and Llama Guard 8B. - consumed 39.3 million GPU hours in training. - 128K context length. - robust tool-use and agent capabilities. - dedicated pad token. - data quality: bad instruction samples filtered using a reward model and LLM-as-a-judge, with additional insights from Instag. - the best open-source LLMs currently available. https://warpcast.com/gm8xx8/0x5f9b5aeb
3 replies
1 recast
8 reactions

TuanNguyen pfp
TuanNguyen
@tuannguyen93
hay day
0 reply
0 recast
0 reaction