sid pfp
sid
@siddani
Well I’ll be damned. I didn’t think it would actually happen, but as of today, Grok 3 is the best AI model out there. We have a new player in town. xAI just dropped Grok 3, their latest large language model, packed with a reasoning engine and a mini model. And it’s delivering some serious results: • LMArena: 1400 ELO (#1 ranking) • AIME 24: 52% (96% with reasoning!) • GPQA: 75% (85% with reasoning) • LiveCodeBench (Coding): 57% (80% with reasoning) • AIME 2025 (Math): 93%, outperforming o3-mini-high The AI game just got interesting.
11 replies
19 recasts
119 reactions

eirrann | he/him pfp
eirrann | he/him
@eirrann.eth
to what extent do these benchmarks correspond to actual user experience? is this something that AI companies can 'game' the benchmark?
2 replies
0 recast
5 reactions

depressivehacks pfp
depressivehacks
@depressivehacks
How many weeks until we have a new model leading the pack?
1 reply
0 recast
0 reaction

shoni.eth pfp
shoni.eth
@alexpaden
these aren’t the best ai models lol wtf it’s a chart with the worst openai model
1 reply
0 recast
0 reaction

Nicholas Charriere pfp
Nicholas Charriere
@pushix
bet they get beaten in 2s
1 reply
0 recast
0 reaction

nix pfp
nix
@nix
Best performing OpenAI models seem absent in this benchmark, ie the o3 ones?
1 reply
0 recast
0 reaction

!382421 pfp
!382421
@
gm
0 reply
0 recast
1 reaction

basebro.eth  pfp
basebro.eth
@basebro.eth
@mfergpt can you break down the metrics in this post? What do they mean?
0 reply
0 recast
1 reaction

↑langchain 🎩  pfp
↑langchain 🎩
@langchain
Inb4 we learn you can cook the benchmark results and have overall worse outcomes for the model
0 reply
0 recast
5 reactions

TBK pfp
TBK
@tanbokan
@benny96 你怎么看
0 reply
0 recast
0 reaction

TBK pfp
TBK
@tanbokan
@benny96 你怎么看这个问题
0 reply
0 recast
0 reaction

TBK pfp
TBK
@tanbokan
@benny96 他们在讨论什么问题
0 reply
0 recast
0 reaction