Content pfp
Content
@
0 reply
0 recast
0 reaction

Leo pfp
Leo
@lsn
Has anyone been doing large scale thematic analysis of free text using LLMs? I'm about to start doing this at my startup and wondered if anyone had ideas about best practice and prompting Please recast to other relevant channels! 🙏
4 replies
2 recasts
11 reactions

Kasra Rahjerdi pfp
Kasra Rahjerdi
@jc4p
lemme know if you want to hop on a call to discuss this, i do it a ton (eg for a project with a training hospital we’re taking reviews of the residents and classifying/extracting concepts) // i’ve done it a ton for curating LLM training data too
1 reply
0 recast
1 reaction

Kasra Rahjerdi pfp
Kasra Rahjerdi
@jc4p
tl;dr most reliable for me has been: - grab embeddings of all unique texts (if you’re doing those, like each tweet if you’re doing tweets) - k-means clustering with whatever N you think makes sense - sampling from the center of each cluster - asking an LLM to classify the topics in each cluster
2 replies
0 recast
1 reaction

Kasra Rahjerdi pfp
Kasra Rahjerdi
@jc4p
oh, and if you're doing hardcoded categories just make a classifier (RoBERTA or etc)
1 reply
0 recast
1 reaction

Leo pfp
Leo
@lsn
When you say ‘make a classifier’ what do you mean? With hardcoded values I’ve just asked chatgpt to pick one of my categories but I assume you might something a bit more low level
1 reply
0 recast
1 reaction