Content
@
0 reply
0 recast
0 reaction
Leo
@lsn
Has anyone been doing large scale thematic analysis of free text using LLMs? I'm about to start doing this at my startup and wondered if anyone had ideas about best practice and prompting Please recast to other relevant channels! đ
4 replies
2 recasts
11 reactions
Kasra Rahjerdi
@jc4p
lemme know if you want to hop on a call to discuss this, i do it a ton (eg for a project with a training hospital weâre taking reviews of the residents and classifying/extracting concepts) // iâve done it a ton for curating LLM training data too
1 reply
0 recast
1 reaction
Kasra Rahjerdi
@jc4p
tl;dr most reliable for me has been: - grab embeddings of all unique texts (if youâre doing those, like each tweet if youâre doing tweets) - k-means clustering with whatever N you think makes sense - sampling from the center of each cluster - asking an LLM to classify the topics in each cluster
2 replies
0 recast
1 reaction
Kasra Rahjerdi
@jc4p
oh, and if you're doing hardcoded categories just make a classifier (RoBERTA or etc)
1 reply
0 recast
1 reaction
Leo
@lsn
When you say âmake a classifierâ what do you mean? With hardcoded values Iâve just asked chatgpt to pick one of my categories but I assume you might something a bit more low level
1 reply
0 recast
1 reaction