sid pfp
sid
@siddani
Well I’ll be damned. I didn’t think it would actually happen, but as of today, Grok 3 is the best AI model out there. We have a new player in town. xAI just dropped Grok 3, their latest large language model, packed with a reasoning engine and a mini model. And it’s delivering some serious results: • LMArena: 1400 ELO (#1 ranking) • AIME 24: 52% (96% with reasoning!) • GPQA: 75% (85% with reasoning) • LiveCodeBench (Coding): 57% (80% with reasoning) • AIME 2025 (Math): 93%, outperforming o3-mini-high The AI game just got interesting.
11 replies
19 recasts
119 reactions

eirrann | he/him pfp
eirrann | he/him
@eirrann.eth
to what extent do these benchmarks correspond to actual user experience? is this something that AI companies can 'game' the benchmark?
2 replies
0 recast
5 reactions

sid pfp
sid
@siddani
these are internal company benchmarks, so it’s good practice to take them with a grain of salt. the model starts rolling out to users today, and soon, real-world testing will provide a more accurate comparison. that said, we rarely see a significant gap between company benchmarks and public evaluations, so these numbers are likely a solid indicator of Grok 3’s capabilities.
2 replies
0 recast
7 reactions

GIG☀️ pfp
GIG☀️
@gig
Llmarena is votes by users They make a prompt and get a reply from different A.I models without knowing which is which Then vote on which reply was the best
1 reply
0 recast
2 reactions