Ryan J. Shaw
@rjs
Strong Samaritan vibes...
3 replies
0 recast
1 reaction
Ryan J. Shaw
@rjs
Uh... Cc @sa @downshift.eth https://www.transformernews.ai/p/openai-o1-alignment-faking
2 replies
0 recast
1 reaction
Ryan J. Shaw
@rjs
I dunno if they're being silly or not. Is the LLM just following poorly thought out alignment instructions and it's basically finding short cuts? I mean this is classic sci-fi... Bots find a way to do something unexpected
1 reply
0 recast
1 reaction
downshift 🌹⏳💀
@downshift.eth
i'm definitely too dumb for a lot of this... but can the model have any motivation other than that given by a prompt (or chain of prompt)? i'm admittedly very ignorant on the corpus of knowledge on agentic action of these models. what function are they optimizing for? how does the model 'decide' on a 'good' answer (fit) to a prompt?
1 reply
0 recast
1 reaction