Tarun Chitra pfp
Tarun Chitra
@pinged
Wow, thanks for the resounding welcome back! As promised, I have a little to tell you about something that had me sort of offline-ish for the last couple months: I had my first bout of AI Doomer-itis Luckily it was cured by trying to write this paper with AI as my assistant and understanding the promises and flaws
10 replies
19 recasts
110 reactions

Jason pfp
Jason
@jachian
Out of curiosity which flaws do you think reasoning and LLM models will retain the longest?
1 reply
0 recast
4 reactions

Tarun Chitra pfp
Tarun Chitra
@pinged
Good question; I often think of this Yann LeCun slide from when he was going around saying LLMs are useless in 2022; I think the reasoning models work (at least in some 0th order sense) by learning how to compress the geometry of the grey ball to the red ellipsoid "quickly" where as single LLMs get stuck diffusing through the grey ball On the other hand, a single self-attention head clearly learns sparse tasks really well and dense tasks (unless composed) poorly (c.f. Sanford, Hsu, Telgarsky, 2024), so there's a sense in which they compress the geometry of the search space faster than other naive models Also LeCun's toy model is obviously wrong (the independence argument can't hold for text, obviously)
0 reply
0 recast
5 reactions