Content
@
0 reply
0 recast
0 reaction
JB Rubinovitz ⌐◨-◨
@rubinovitz
When you dig into at the o3 benchmarks they don’t measure what Twitter folks say they do. Eg ability to fix isolated issues in Python repos != coding is over, but that’s the SWE benchmark. That being said, if we assume exponential growth in models we can get there. https://x.com/karpathy/status/1871312079145361645?s=46
0 reply
2 recasts
9 reactions