Content pfp
Content
@
0 reply
0 recast
0 reaction

𝚐π”ͺ𝟾𝚑𝚑𝟾 pfp
𝚐π”ͺ𝟾𝚑𝚑𝟾
@gm8xx8
Sycophancy to subterfuge: Investigating reward tampering in language models - Anthropic https://www.anthropic.com/research/reward-tampering
0 reply
1 recast
9 reactions