Content pfp
Content
@
0 reply
0 recast
0 reaction

𝚐π”ͺ𝟾𝚑𝚑𝟾 pfp
𝚐π”ͺ𝟾𝚑𝚑𝟾
@gm8xx8
Effective Long-Context Scaling of Foundation Models https://arxiv.org/abs/2309.16039
1 reply
0 recast
0 reaction

𝚐π”ͺ𝟾𝚑𝚑𝟾 pfp
𝚐π”ͺ𝟾𝚑𝚑𝟾
@gm8xx8
New long-context LLMs extend the game with a context window of up to 32,768 tokens! Originating from Llama 2, the 70B variant surpasses even gpt-3.5-turbo-16k in performance. A must-see for anyone interested in efficient and effective pretraining methods
0 reply
0 recast
0 reaction