Content pfp
Content
@
0 reply
0 recast
0 reaction

𝚐𝔪𝟾𝚡𝚡𝟾 pfp
𝚐𝔪𝟾𝚡𝚡𝟾
@gm8xx8
OpenAI’s o1 update enhances reasoning through reinforcement learning, enabling step-by-step problem-solving similar to human thought. The longer it “thinks,” the better it performs, it introduces a new scaling paradigm beyond pretraining. Rather than relying solely on prompting, o1’s chain-of-thought reasoning improves with adaptive compute, which can be scaled at inference time. - o1 outperforms GPT-4o in reasoning, ranking in the 89th percentile on Codeforces. - It uses chain-of-thought to break down problems, correct errors, and adapt, though some specifics remain unclear. - Excels in areas like data analysis, coding, and math. - o1-preview and o1-mini models are available now, with evals proving it’s not just a one-off improvement. Trusted API users will have access soon. - Results on AIME and GPQA are strong, with o1 showing significant improvement on complex prompts where GPT-4o struggles. - The system card (https://openai.com/index/openai-o1-system-card/) showcases o1’s best capabilities.
7 replies
8 recasts
38 reactions

Danny pfp
Danny
@mad-scientist
This is both amazing and expected. Thinking "slow" and thinking about how to solve a problem, has been discussed for a while. But this is also a fundamental challenge for OpenAI. So far, the heavy lifting was the training, and inference, while not completely light, was the reward part. Now, inference becomes heavy as well, and open source models are probably not that far behind.
0 reply
0 recast
0 reaction