Kasra Rahjerdi pfp
Kasra Rahjerdi
@jc4p
fam in ~18 hours we'll have 157 million casts with embeddings, pray for me it doesn't crash overnight
11 replies
11 recasts
113 reactions

shoni.eth pfp
shoni.eth
@alexpaden
not sure if it will help or not but here's mine + 8bit quantization after on f16. i rerun this every 5 min https://gist.github.com/alexpaden/b99668307e6e16c18e5ce581c8d719b8
1 reply
0 recast
1 reaction

shoni.eth pfp
shoni.eth
@alexpaden
^ it's cpu optimized for mac studio not gpu
1 reply
0 recast
1 reaction

Kasra Rahjerdi pfp
Kasra Rahjerdi
@jc4p
love it, thank you! iā€™m planning on doing monthly ingestion of casts + monthly embeddings of those so hopefullyyyy this will be easier after i finish this giant backfill
1 reply
0 recast
0 reaction

shoni.eth pfp
shoni.eth
@alexpaden
yeah only 1 failure isn't bad at all it took me like a week because i was trying to figure out optimizations which i am still missing a bunch on i think
1 reply
0 recast
1 reaction

Kasra Rahjerdi pfp
Kasra Rahjerdi
@jc4p
silly me i thought the hard part would be writing a super optimized gRPC client šŸ˜‚šŸ™ˆ
1 reply
0 recast
0 reaction

shoni.eth pfp
shoni.eth
@alexpaden
lol i casted about my learnings before (idk where) but the code prob says better. basically preallocating memory buffers and in my case using mps i think on avg i was at ~10ktextspersecond but i think if correctly optimized could prob break 50-100k. gpu i think is faster inference idk this code records the time taken per major section which was helpful for me
1 reply
0 recast
1 reaction

Kasra Rahjerdi pfp
Kasra Rahjerdi
@jc4p
how are you actually pulling the casts? i do a cloud hub + .net multiprocessing + multiplexed connections
1 reply
0 recast
0 reaction

shoni.eth pfp
shoni.eth
@alexpaden
i run the neynar parquet service which is a bit pricey but has some bonus features like spam labels and stuff https://github.com/alexpaden/neynar_parquet_importer i take that database, which has all the fundamentals (updated every 5min), then built another pipeline to run stuff like this to generate new columns and advanced analytics tables
2 replies
0 recast
1 reaction

shoni.eth pfp
shoni.eth
@alexpaden
i run them all on my mac studio so i don't have to pay cloud postgres and whatnot which was itself expensive for full backfill (and got trapped in that plan on digital ocean) i run all this all and expose the db via tailscale on a fully maxed out mac studio. i'll do more with llm later on another studio or a gpu setup for training small or medium models
0 reply
0 recast
0 reaction

Kasra Rahjerdi pfp
Kasra Rahjerdi
@jc4p
love that! i tried doing hubble from the hub-monorepo but it was so slow i wanted to die haha, i kind of REFUSE to pay for open data though otherwise id use neynar too (granted i use their api for a ton of frames)
2 replies
0 recast
0 reaction