Content pfp
Content
@
0 reply
0 recast
0 reaction

Stepan Gershuni pfp
Stepan Gershuni
@stepa
Everyone forgot about Sora, the new hit of the week is Groq. The team created a custom ASIC for LLM, which allows generating ~500 tokens per second. For comparison, GPT averages 30 tokens/s.
1 reply
0 recast
0 reaction

Stepan Gershuni pfp
Stepan Gershuni
@stepa
The ability to instantly and conditionally free read, analyze, and generate dozens of pages of text improves AI system performance. For clarity, think about the metric "LLM requests per user task":
1 reply
0 recast
0 reaction

Stepan Gershuni pfp
Stepan Gershuni
@stepa
- a chatbot: one request, one response - a simple RAG makes 2-3 requests (search, reasoning, generating a response) - complex chain of / tree of thoughts can makes 10 requests (clarifying the question, deciding on a response strategy, generating candidates, selecting the best)
1 reply
0 recast
0 reaction