Content pfp
Content
@
0 reply
0 recast
0 reaction

𝚐π”ͺ𝟾𝚑𝚑𝟾 pfp
𝚐π”ͺ𝟾𝚑𝚑𝟾
@gm8xx8
Reaching 1B Context Length With RAG Zyphra: https://www.zyphra.com/post/reaching-1b-context-length-with-rag retrieval system enables LLMs to process up to 1 billion tokens efficiently on a standard CPU using a sparse graph-based approach. Outperforms RAG methods with dense embeddings or long-context transformers. I’m impressed with the work Zyphra has been doing in the SSM space (most recently Zamba2-7B) so I’m eager to see more.
6 replies
5 recasts
21 reactions

osama pfp
osama
@osama
highlights *standard cpu* wat?
0 reply
0 recast
1 reaction

Kazi  pfp
Kazi
@kazi
for real?
0 reply
0 recast
1 reaction

wizard not parzival pfp
wizard not parzival
@shoni.eth
i want the cross-knowledge of not using rag though, am i a moron be honest
0 reply
0 recast
0 reaction

daivd πŸŽ©πŸ‘½ ↑ pfp
daivd πŸŽ©πŸ‘½ ↑
@qt
1B is a lotta lotta LOTTA tokens, thousands of books. Very exciting
0 reply
0 recast
0 reaction

manansh ❄️ pfp
manansh ❄️
@manansh
1 billion is outrageous
0 reply
0 recast
0 reaction

Jake Casey pfp
Jake Casey
@jakeacasey
I'm not super booked up-- does this mean that somebody built a retrieval system that works fine with 1 billion tokens on normal hardware?
0 reply
0 recast
0 reaction