Content
@
0 reply
0 recast
0 reaction
Leo
@lsn
Has anyone been doing large scale thematic analysis of free text using LLMs? I'm about to start doing this at my startup and wondered if anyone had ideas about best practice and prompting Please recast to other relevant channels! 🙏
4 replies
2 recasts
11 reactions
Kasra Rahjerdi
@jc4p
lemme know if you want to hop on a call to discuss this, i do it a ton (eg for a project with a training hospital we’re taking reviews of the residents and classifying/extracting concepts) // i’ve done it a ton for curating LLM training data too
1 reply
0 recast
1 reaction
Kasra Rahjerdi
@jc4p
tl;dr most reliable for me has been: - grab embeddings of all unique texts (if you’re doing those, like each tweet if you’re doing tweets) - k-means clustering with whatever N you think makes sense - sampling from the center of each cluster - asking an LLM to classify the topics in each cluster
2 replies
0 recast
1 reaction
Kasra Rahjerdi
@jc4p
oh, and if you're doing hardcoded categories just make a classifier (RoBERTA or etc)
1 reply
0 recast
1 reaction
Leo
@lsn
When you say ‘make a classifier’ what do you mean? With hardcoded values I’ve just asked chatgpt to pick one of my categories but I assume you might something a bit more low level
1 reply
0 recast
1 reaction
Kasra Rahjerdi
@jc4p
checkout this gist: https://gist.github.com/jc4p/3c76bdb5f85df8f52d8f0b0256097cc3 first file: What you're saying, having the LLM pick one of the categories, i gave it like 1000 examples and saved the data to separate JSON files second file: use a BERT-like model to learn how to look at any text and classify it (for me it was personal vs objective comments) third file: use that BERT-like model to run through the entire dataset
1 reply
0 recast
1 reaction