Content pfp
Content
@
0 reply
0 recast
0 reaction

Dan Romero pfp
Dan Romero
@dwr.eth
Let's say you have a corpus of text — 10 million words — about a specific topic. 1. What's the best way to "train a model" on that text? 2. Is that even the right term? Or is it using an existing foundational model and then augmenting it? Fine-tuning it? Something else?
22 replies
7 recasts
77 reactions

ashesfall.eth pfp
ashesfall.eth
@ashesfall
That’s simply not a large enough corpus for (present-day) ML systems to derive useful comprehension of the language that the text is written in. So your only choice, if you want useful language generation, is to fine-tune an existing model to take advantage of language capabilities derived from a much larger dataset.
0 reply
0 recast
0 reaction