Kasra Rahjerdi
@jc4p
might've under estimated how long it'd take me to process 157 million casts, but in 6 hours i'll have authoritative rankings on who said what words first!!
13 replies
7 recasts
74 reactions
kevin j
@entropybender
ooh nice i was thinking of building this into our agent but decided it was too much work is it exact text/ngram or embedding based? basically i wanted to be able to answer if someone asked "i saw this cast by ___ recently, are there other people who discussed similar things before?"
1 reply
0 recast
0 reaction
Kasra Rahjerdi
@jc4p
this rn is lemmatization + tokenizing but for your case you def need embeddings, can generate those embeddings pretty cheaply on a local GPU using the hf dataset and https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 tho!
1 reply
0 recast
2 reactions
kevin j
@entropybender
i won an optimism grant to do something like this and oss it but never started cuz they never gave me the money lmao something like this would be very useful so clients all have access to a base level solid search engine
2 replies
0 recast
0 reaction
Kasra Rahjerdi
@jc4p
lmaoooo yeah if you have a local RTX GPU it’s very doable for cheap but will take a day or two to run, if you have $100-$200 to waste it can be done on a remote machine much quicker
1 reply
0 recast
0 reaction
Kasra Rahjerdi
@jc4p
the hard part is constantly updating it cause the hub api doesn’t have any concept of “get me casts from the last 24 hours”
0 reply
0 recast
0 reaction