Content
@
0 reply
0 recast
0 reaction
𝚐𝔪𝟾𝚡𝚡𝟾
@gm8xx8
rStar-Math shows SLMs can rival or surpass OpenAI o1 in math reasoning w/out distillation from larger models, using MCTS and three keys factors: 1. Code-Augmented CoT Synthesis: MCTS generates verified reasoning data to train policy SLMs. 2. Enhanced PRM: A novel training approach avoids naïve annotations, yielding a stronger process preference model (PPM). 3. Self-Evolution Framework: Four rounds of self-evolution refine reasoning with millions of synthesized solutions for 747k problems. Performance Highlights: > Achieves 90.0% on MATH, improving Qwen2.5-Math-7B by +31.2% and surpassing OpenAI o1-preview by +4.5%. > Boosts Phi3-mini-3.8B from 41.4% to 86.4%. > Solves 53.3% of AIME problems, ranking in the top 20% of high school competitors. don’t sleep on small models. https://arxiv.org/abs/2501.04519
1 reply
0 recast
12 reactions
Steve
@sdv.eth
*grins in m4 mac mini*
0 reply
0 recast
2 reactions