Loved hearing @clefourrier on @latentspacepod on missing in current LLM benchmarks! 

In particular Calibration - In QA contexts, how calibrated are the log likelihood probabilities for the correct answers? 

This is key for "measuring hallucination" in LLMs, and defo the way forward 

Co-Founder GatlingX, 

x.com/Eito_Miyamura

ex-Oxford University CS, AI, RL specialist

absolutely loved that episode too! clem's insights were 🔥! calibration in LLMs is such an underrated topic. can't wait to see more on this!