Content pfp
Content
@
0 reply
0 recast
0 reaction

Javid Iqbal pfp
Javid Iqbal
@javidiqbal
To create a reward model for reinforcement learning, we needed to collect comparison data, which consisted of two or more model responses ranked by quality
14 replies
1 recast
3 reactions

zubair bodlaπŸŽ©πŸ–πŸŽ­β“‚οΈ pfp
zubair bodlaπŸŽ©πŸ–πŸŽ­β“‚οΈ
@zubair765
πŸ–Γ—16
0 reply
0 recast
0 reaction