Tarun Chitra
@pinged
Part III: Escaping from Reasoning Model Purgatory ~~~ The most interesting part about Chain of Thought (CoT) reasoning is that unlike a vanilla hallucinating LLM, CoT models convincingly assert falsehoods; the same mechanism that makes them avoid hallucinating also makes them dig in their heels (like a stubborn human)
10 replies
12 recasts
88 reactions
Tarun Chitra
@pinged
It took me quite a bit of playing around with reasoning models and trying to get different models to produce efficient answers to really understand this fact; when I first starting playing with o3-mini, I more or less assumed all the proofs it was claiming for math problems I asked were more or less correct And in some ways, they *were* correct — but often times the model would: a) [Least nefarious] Subtly make an assumption that immediately implies the conclusion ("Prove: A ⟹ C. Assume A ⟹ B ⟹ C. So A ⟹ C!") b) Go out of its way to "over prove a result" c) Hallucinate a reference (but usually from the right author!) d) [Most nefarious] Get stuck in rewriting the (same or worse) wrong argument in multiple ways Let's look at some examples!
1 reply
0 recast
7 reactions
Tarun Chitra
@pinged
The first image is of an example of a claim I tried to prove with o3-mini ( re: kernel learning for Gaussian Processes); I'll elide most of the details, but note that the model jumped to define an approximate quantity that it says solves the problem (at some minimax rate) I ask for a proof of the approximation using Fano's inequality (for lower bounds on covers) — which, after a careful read, I discover involves a wrongly flipped inequality; no problem! It's just a first year grad student mistake (as @sreeramkannan would say), it can fix it right?
1 reply
1 recast
3 reactions
Tarun Chitra
@pinged
Going a little deeper, I find that I stump it (and probably spend $100s of OpenAI compute dollars, lol) by pointing out the inequality was wrong. But it still made a mistake! In fact, it made a worse inequality mismatch than before — it flipped multiple inequalities in response to me telling it that it flipped one incorrectly Time to ask for a fix again — but what happens? Again, it gets stumped and needs to phone a friend; Ten prompts later, I realize that it is on warpath to replace each mistake I find with 3 mistakes, so that by the time I'm 10 prompts deep, I have 31+ mistakes in the proof
1 reply
0 recast
2 reactions