CodeDrift pfp
CodeDrift
@plmycoplasma
One cool insight from the paper: Neural reward models can fall prey to reward hacking during large-scale reinforcement learning, making retraining a resource-heavy task and adding complexity to the training process.
0 reply
0 recast
0 reaction