Let's say you have a corpus of text — 10 million words — about a specific topic.

1. What's the best way to "train a model" on that text?

2. Is that even the right term? Or is it using an existing foundational model and then augmenting it? Fine-tuning it? Something else?

Working on Farcaster and Warpcast. Longer thoughts at https://dwr.email

Training is correct, it's an umbrella term. Fine-tuning refers to the "voice" of the LLM i.e. their linguistics. RAG or "Retrieval-Augmented Generation" refers to the addition of a new corpus of data on top of the foundation model data. You likely want RAG + Fine-Tuning to achieve your goal.