It is surprising how well Gemini 2.5 Pro does in many benchmarks but not talked about a lot. I wonder why there isn't as much attention in the Google model. Here is a view of how most models perform on a PhD-level science reasoning benchmark called GPQA Diamond

🇧🇷🇺🇸 - Book: Making Things Think: https://holloway.com/mtt .Investor in Wander, Carry, Footprint, Merkle Manufactory (Farcaster), Dynamic, Paragraph

Stay Curious // iixii // Lazer Technologies // Prev: CoinDesk, Roc Nation // based @basedevo

long time between updates relative to others and 2.5 has only been around for 2 months

I've switched to using it exclusively for coding since it's significantly better than the rest even Claude

it’s definitely picked up the last few months but the benchmarks just haven’t usually tracked with real world experience imo