Kasra Rahjerdi
@jc4p
might've under estimated how long it'd take me to process 157 million casts, but in 6 hours i'll have authoritative rankings on who said what words first!!
13 replies
7 recasts
76 reactions
kevin j
@entropybender
ooh nice i was thinking of building this into our agent but decided it was too much work is it exact text/ngram or embedding based? basically i wanted to be able to answer if someone asked "i saw this cast by ___ recently, are there other people who discussed similar things before?"
1 reply
0 recast
0 reaction
Kasra Rahjerdi
@jc4p
this rn is lemmatization + tokenizing but for your case you def need embeddings, can generate those embeddings pretty cheaply on a local GPU using the hf dataset and https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 tho!
1 reply
0 recast
2 reactions