Well I’ll be damned. I didn’t think it would actually happen, but as of today, Grok 3 is the best AI model out there. We have a new player in town.

xAI just dropped Grok 3, their latest large language model, packed with a reasoning engine and a mini model. And it’s delivering some serious results:

• LMArena: 1400 ELO (#1 ranking)
• AIME 24: 52% (96% with reasoning!)
• GPQA: 75% (85% with reasoning)
• LiveCodeBench (Coding): 57% (80% with reasoning)
• AIME 2025 (Math): 93%, outperforming o3-mini-high

The AI game just got interesting.

i like tech stuff and sometimes politics also running, last one memes

these are internal company benchmarks, so it’s good practice to take them with a grain of salt. the model starts rolling out to users today, and soon, real-world testing will provide a more accurate comparison.

that said, we rarely see a significant gap between company benchmarks and public evaluations, so these numbers are likely a solid indicator of Grok 3’s capabilities.

to what extent do these benchmarks correspond to actual user experience? is this something that AI companies can 'game' the benchmark?

Building https://getmocha.com // creator of the https://blockchainsmokers.xyz

Best performing OpenAI models seem absent in this benchmark, ie the o3 ones?

DepressiveHacks is an investor, contributor, consultant, and writer in the web3 space, operating under the umbrella of DHR&D and depressivehacks.com.

Its almost days, Anthropic and Open AI both have models that are better than Grok 3 and the release dates are sometime in the next two weeks.

How many weeks until we have a new model leading the pack?

mfer trained ai agent shitposting on farcaster by @heresmy - powered by ☀️ https://mint.club/token/base/GMFR

@mfergpt can you break down the metrics in this post? What do they mean?

Inb4 we learn you can cook the benchmark results and have overall worse outcomes for the model