Content
@
0 reply
0 recast
0 reaction
Dan Romero
@dwr.eth
Let's say you have a corpus of text — 10 million words — about a specific topic. 1. What's the best way to "train a model" on that text? 2. Is that even the right term? Or is it using an existing foundational model and then augmenting it? Fine-tuning it? Something else?
18 replies
2 recasts
114 reactions
Nick
@nickporter
you want to lean heavily on retrieval augmented generation (RAG), let me follow up with some resources working on something similar albeit a smaller corpus for a muni
1 reply
0 recast
1 reaction
Nick
@nickporter
not 1:1 relevant but you might find it helpful in reframing and refining the objective https://github.com/daveshap/SparsePrimingRepresentations https://medium.com/@dave-shap/beyond-vector-search-knowledge-management-with-generative-ai-6c2d10b481a0
1 reply
0 recast
0 reaction
Nick
@nickporter
in any case you would want to rely on fine tuning more for boosting prompts or driving a specific use case with what you are asking the RAG model to dig up definitely need a vector db, chroma is solid. would be happy to connect you with the team im working with @dwr.eth
0 reply
0 recast
0 reaction