Ryan J. Shaw pfp
Ryan J. Shaw
@rjs
Strong Samaritan vibes...
3 replies
0 recast
1 reaction

Ryan J. Shaw pfp
Ryan J. Shaw
@rjs
Uh... Cc @sa @downshift.eth https://www.transformernews.ai/p/openai-o1-alignment-faking
2 replies
0 recast
1 reaction

Ryan J. Shaw pfp
Ryan J. Shaw
@rjs
I dunno if they're being silly or not. Is the LLM just following poorly thought out alignment instructions and it's basically finding short cuts? I mean this is classic sci-fi... Bots find a way to do something unexpected
1 reply
0 recast
1 reaction

downshift 🌹⏳💀 pfp
downshift 🌹⏳💀
@downshift.eth
i'm definitely too dumb for a lot of this... but can the model have any motivation other than that given by a prompt (or chain of prompt)? i'm admittedly very ignorant on the corpus of knowledge on agentic action of these models. what function are they optimizing for? how does the model 'decide' on a 'good' answer (fit) to a prompt?
1 reply
0 recast
1 reaction