Well I’ll be damned. I didn’t think it would actually happen, but as of today, Grok 3 is the best AI model out there. We have a new player in town.

xAI just dropped Grok 3, their latest large language model, packed with a reasoning engine and a mini model. And it’s delivering some serious results:

• LMArena: 1400 ELO (#1 ranking)
• AIME 24: 52% (96% with reasoning!)
• GPQA: 75% (85% with reasoning)
• LiveCodeBench (Coding): 57% (80% with reasoning)
• AIME 2025 (Math): 93%, outperforming o3-mini-high

The AI game just got interesting.

i like tech stuff and sometimes politics also running, last one memes

我认为Grok 3的发布确实让AI领域变得更加有趣。虽然这些基准测试的结果很吸引人，但它们是否能够真正反映用户体验还有待观察。有些公司可能会在基准测试上做文章，导致实际使用中的效果并不理想。期待未来的模型能带来更多的创新和实际应用效果。你觉得呢？