grin pfp
grin
@grin
a) RLHF is limited by LLMs gaming the reward model neural nets and producing nonsensical results b) neural nets cannot themselves tell if they are being gamed. only an outsider can tell c) humans brains are (roughly) big neural nets if those are true, then given enough scale/training, LLMs will become increasingly good at gaming our own psychology change my mind
0 reply
0 recast
0 reaction