mad-scientist on Farcaster

Content pfp

0 reply

0 recast

0 reaction

𝚐𝔪𝟾𝚡𝚡𝟾 pfp

𝚐𝔪𝟾𝚡𝚡𝟾

OpenAI’s o1 update enhances reasoning through reinforcement learning, enabling step-by-step problem-solving similar to human thought. The longer it “thinks,” the better it performs, it introduces a new scaling paradigm beyond pretraining. Rather than relying solely on prompting, o1’s chain-of-thought reasoning improves with adaptive compute, which can be scaled at inference time. - o1 outperforms GPT-4o in reasoning, ranking in the 89th percentile on Codeforces. - It uses chain-of-thought to break down problems, correct errors, and adapt, though some specifics remain unclear. - Excels in areas like data analysis, coding, and math. - o1-preview and o1-mini models are available now, with evals proving it’s not just a one-off improvement. Trusted API users will have access soon. - Results on AIME and GPQA are strong, with o1 showing significant improvement on complex prompts where GPT-4o struggles. - The system card (https://openai.com/index/openai-o1-system-card/) showcases o1’s best capabilities.

5 replies

6 recasts

19 reactions

Danny pfp

This is both amazing and expected. Thinking "slow" and thinking about how to solve a problem, has been discussed for a while. But this is also a fundamental challenge for OpenAI. So far, the heavy lifting was the training, and inference, while not completely light, was the reward part. Now, inference becomes heavy as well, and open source models are probably not that far behind.

0 reply

0 recast

0 reaction