Content pfp
Content
@
0 reply
0 recast
0 reaction

Leo pfp
Leo
@lsn
Please give a warm welcome to my friend @rayd He’s an AI researcher with a paper out soon, working on making Language Models better at being Agents, and helping models resist prompt injection attacks (out on arxiv tomorrow) He’s explained a bit about his research in the comments
2 replies
0 recast
6 reactions

Raymond pfp
Raymond
@rayd
Glad to be here! First paper is on why base models doing offline prediction are theoretically doomed to hallucinate etc until they get external feedback, and how the same underlying property makes it hard for them to generalise strategies better than their training data (1/-)
1 reply
0 recast
1 reaction

Raymond pfp
Raymond
@rayd
Second is on taking tasks where larger models do worse (like prompt injection) and helping models do better (30% gain) by treating them as mixtures of distributions, and downweighing the bad distribution
1 reply
0 recast
1 reaction