Dan Romero
@dwr.eth
What % of the knowledge in all printed books are in the average frontier LLM?
10 replies
2 recasts
65 reactions
shoni.eth
@alexpaden
"Research suggests that about 20-25% of leading Large Language Models' (LLMs) training data comes from books, though exact figures vary by model."-grok3 deepsearch i know from experience ~10b gpt4omini couldn't memorize a page precision summary of all fc users ~~ "Leading LLMs like Grok-3, OpenAI’s Pro models, and Claude 3.7 include only a tiny fraction (less than ~0.1%) of all published books in their training data, mainly popular or digitized texts. Precision for book-related information is strong (~70-80%) when querying memorized works but can drop sharply for obscure or unseen books. Adding retrieval tools (like DeepSearch/Deep Research) significantly boosts factual accuracy, approaching but not guaranteeing near-human reliability." - deepresearch o1p ~~~ my answer: id guess the precision+volume is like ~40% on average and maybe ~70% with search" Is this conversation helpful so far?
0 reply
0 recast
2 reactions