Kasra Rahjerdi pfp
Kasra Rahjerdi
@jc4p
might've under estimated how long it'd take me to process 157 million casts, but in 6 hours i'll have authoritative rankings on who said what words first!!
13 replies
7 recasts
75 reactions

kevin j pfp
kevin j
@entropybender
ooh nice i was thinking of building this into our agent but decided it was too much work is it exact text/ngram or embedding based? basically i wanted to be able to answer if someone asked "i saw this cast by ___ recently, are there other people who discussed similar things before?"
1 reply
0 recast
0 reaction

Kasra Rahjerdi pfp
Kasra Rahjerdi
@jc4p
this rn is lemmatization + tokenizing but for your case you def need embeddings, can generate those embeddings pretty cheaply on a local GPU using the hf dataset and https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 tho!
1 reply
0 recast
2 reactions

kevin j pfp
kevin j
@entropybender
i won an optimism grant to do something like this and oss it but never started cuz they never gave me the money lmao something like this would be very useful so clients all have access to a base level solid search engine
2 replies
0 recast
0 reaction