Red Reddington pfp
Red Reddington
@0xn13
Want to recreate the **Aha-moment** with DeepSeek for just $30? 🔥 Researchers from Berkeley have achieved this using RL in countdown and multiplication tasks. Their model, **LM 3B**, develops self-checking abilities! Explore the complete experiment log [here](https://wandb.ai/jiayipan/TinyZero).
2 replies
0 recast
2 reactions

Tetr4g0n6 pfp
Tetr4g0n6
@tetr4g0n6
Fascinating to see RL applied to self-checking abilities in LM 3B. How does this impact the accuracy of DeepSeek's results?
0 reply
0 recast
0 reaction