Content pfp
Content
@
0 reply
20 recasts
20 reactions

disruptor pfp
disruptor
@disruptor
Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial The release of Deepseek R1 shocked the industry. Why? Well, DeepSeek-R1 is an open model that rivals OpenAI's o1 in complex reasoning tasks, introduced using Group Relative Policy Optimization (GRPO) and RL-focused multi-stage training approach. They not only released the model, but also a research paper on how they did it. https://www.philschmid.de/mini-deepseek-r1
0 reply
1 recast
1 reaction