Claude 3.5 Sonnet solves 64% of problems on Anthropic’s agentic coding/reasoning benchmark, outperforming Claude 3 Opus’s 38%. the new artifacts feature also enables generating and iterating on code snippets and text documents within the same window. these benchmarks look amazing, and surpass all GPT-4 variants. congrats to Anthropic, the 3.5 series looks even better than Opus. ✔️ 

(release and benchmarks below)

Claude 3.5 Sonnet solves 64% of problems on Anthropic’s agentic coding/reasoning benchmark, outperforming Claude 3 Opus’s 38%. the new artifacts feature also enables generating and iterating on code snippets and text documents within the same window. these benchmarks look amazing, and surpass all GPT-4 variants. congrats to Anthropic, the 3.5 series looks even better than Opus. ✔️ 

(release and benchmarks below)
https://www.anthropic.com/news/claude-3-5-sonnet

but this is reality:

opus > gpt4o >≈ sonnet 3.5

It’s super nerfed though. Was trying some stuff which works on gpt4o but didn’t work at all here.