𝚐𝔪𝟾𝚡𝚡𝟾 on Warpcast

Content pfp

0 reply

0 recast

0 reaction

𝚐𝔪𝟾𝚡𝚡𝟾 pfp

𝚐𝔪𝟾𝚡𝚡𝟾

Effective Long-Context Scaling of Foundation Models https://arxiv.org/abs/2309.16039

1 reply

0 recast

0 reaction

𝚐𝔪𝟾𝚡𝚡𝟾 pfp

𝚐𝔪𝟾𝚡𝚡𝟾

New long-context LLMs extend the game with a context window of up to 32,768 tokens! Originating from Llama 2, the 70B variant surpasses even gpt-3.5-turbo-16k in performance. A must-see for anyone interested in efficient and effective pretraining methods

0 reply

0 recast

0 reaction