Farcaster

Qwen Releases QwQ-32B: Embracing the Power of Reinforcement Learning

QwQ-32B is Qwen’s new reasoning model with only 32 billion parameters that rivals cutting-edge reasoning model, e.g., DeepSeek-R1.

> With this model, they found that RL training continuously improve the performance especially in math and coding

> The continous scaling of RL can help a medium-size model achieve competitieve performance against gigantic MoE model

Source:

Qwen Releases QwQ-32B: Embracing the Power of Reinforcement Learning

QwQ-32B is Qwen’s new reasoning model with only 32 billion parameters that rivals cutting-edge reasoning model, e.g., DeepSeek-R1.

> With this model, they found that RL training continuously improve the performance especially in math and coding

> The continous scaling of RL can help a medium-size model achieve competitieve performance against gigantic MoE model

Source: https://qwenlm.github.io/blog/qwq-32b/