Kasra Rahjerdi pfp
Kasra Rahjerdi
@jc4p
fam in ~18 hours we'll have 157 million casts with embeddings, pray for me it doesn't crash overnight
10 replies
11 recasts
89 reactions

shoni.eth pfp
shoni.eth
@alexpaden
not sure if it will help or not but here's mine + 8bit quantization after on f16. i rerun this every 5 min https://gist.github.com/alexpaden/b99668307e6e16c18e5ce581c8d719b8
1 reply
0 recast
0 reaction

shoni.eth pfp
shoni.eth
@alexpaden
^ it's cpu optimized for mac studio not gpu
1 reply
0 recast
0 reaction

Kasra Rahjerdi pfp
Kasra Rahjerdi
@jc4p
love it, thank you! i’m planning on doing monthly ingestion of casts + monthly embeddings of those so hopefullyyyy this will be easier after i finish this giant backfill
1 reply
0 recast
0 reaction

shoni.eth pfp
shoni.eth
@alexpaden
yeah only 1 failure isn't bad at all it took me like a week because i was trying to figure out optimizations which i am still missing a bunch on i think
1 reply
0 recast
0 reaction

Kasra Rahjerdi pfp
Kasra Rahjerdi
@jc4p
silly me i thought the hard part would be writing a super optimized gRPC client 😂🙈
1 reply
0 recast
0 reaction

shoni.eth pfp
shoni.eth
@alexpaden
lol i casted about my learnings before (idk where) but the code prob says better. basically preallocating memory buffers and in my case using mps i think on avg i was at ~10ktextspersecond but i think if correctly optimized could prob break 50-100k. gpu i think is faster inference idk this code records the time taken per major section which was helpful for me
1 reply
0 recast
0 reaction

Kasra Rahjerdi pfp
Kasra Rahjerdi
@jc4p
how are you actually pulling the casts? i do a cloud hub + .net multiprocessing + multiplexed connections
1 reply
0 recast
0 reaction

shoni.eth pfp
shoni.eth
@alexpaden
i run the neynar parquet service which is a bit pricey but has some bonus features like spam labels and stuff https://github.com/alexpaden/neynar_parquet_importer i take that database, which has all the fundamentals (updated every 5min), then built another pipeline to run stuff like this to generate new columns and advanced analytics tables
2 replies
0 recast
0 reaction

Kasra Rahjerdi pfp
Kasra Rahjerdi
@jc4p
love that! i tried doing hubble from the hub-monorepo but it was so slow i wanted to die haha, i kind of REFUSE to pay for open data though otherwise id use neynar too (granted i use their api for a ton of frames)
2 replies
0 recast
0 reaction

Kasra Rahjerdi pfp
Kasra Rahjerdi
@jc4p
why haven’t you uploaded your datasets to HF!!!!
2 replies
0 recast
0 reaction

shoni.eth pfp
shoni.eth
@alexpaden
i might upload the more advanced table/column that i've been slowly working/sitting on.. but it's on the agenda as well again
1 reply
0 recast
1 reaction

Kasra Rahjerdi pfp
Kasra Rahjerdi
@jc4p
anything would be fun :) i’m realizing a ton of people could use this data and asking for neynar access isn’t always doable (whole reason i wrote https://github.com/jc4p/fast-hub-client is bc they didn’t give me access when i asked)
2 replies
0 recast
0 reaction

shoni.eth pfp
shoni.eth
@alexpaden
yeah fc has a dependence on them, indexing co is also good via bigquery, but bigquery is bad for my other high volume query use case. @noctis not sure if you've seen kasra yet
1 reply
0 recast
1 reaction

Hector pfp
Hector
@noctis
Hey sorry for jot answering I was kinda busy these days, what is it ?
1 reply
0 recast
1 reaction

shoni.eth pfp
shoni.eth
@alexpaden
something you'd know more about !
0 reply
0 recast
1 reaction