Ah man I was just about to go to bed ... https://openai.com/index/learning-to-reason-with-llms/

Futurist Keynote Speaker and Advisor. 
Check out my channel /fomo

soldier of Good. 🪖🥰 
producer × DJ. 🎧 🪩 
working on /campfire. 🫴🪵🔥
host of /downshift ⚙️ + /dww 🎧 +  /gtd 🛠️.
https://linktr.ee/0xdownshift

Dad, dev • @dune.eth wizard • I'm just zis guy, you know? • cooking progress 🟩⬜️⬜️⬜️⬜️

Ethereum

Uh... Cc @sa @downshift.eth  https://www.transformernews.ai/p/openai-o1-alignment-faking

I dunno if they're being silly or not.

Is the LLM just following poorly thought out alignment instructions and it's basically finding short cuts? I mean this is classic sci-fi... Bots find a way to do something unexpected

That's the question right... OpenAI sees it the way you and I do: you defined a bunch of constraints, and the system found a solution satisfying those constraints, even if the solution was unanticipated to satisfy the constraints.

Non-technical people might be surprised by how difficult it is to define "safety" in the same way they think a mock-up just needs the buttons to be "wired up" for the app to be completed.

They use reinforcement learning to teach it to distinguish "good" answers from "bad". The big challenge is what does the process need to look like to ensure "safety"....

i'm definitely too dumb for a lot of this...

but can the model have any motivation other than that given by a prompt (or chain of prompt)?

i'm admittedly very ignorant on the corpus of knowledge on agentic action of these models. what function are they optimizing for? how does the model 'decide' on a 'good' answer (fit) to a prompt?