Content
@
0 reply
0 recast
0 reaction
not parzival
@shoni.eth
tested my first farcaster embedding/ai pipeline benchmarks on m2 ultra here's what i found: - using mps baremetal i can scale from ~500 to 50k+ tokens/sec. main bottleneck was no mps support in docker containers despite various workarounds, forcing cpu-only processing. performance breakdown: - mps optimized: ~500 tokens/sec per instance - cpu only: ~70 tokens/sec per instance - apple neural engine (non-pytorch): ~7 tokens/sec (idk) running multiple instances via supervisor now. entire pipeline was built using cursor agent - once we add reasoning models, automation potential is huge for handling obscure commands and codebase nav (300 line files). setup I'm using against 200m+ rows of farcaster casts: batch 256, 36 instances, `sentence-transformers/all-MiniLM-L6-v2`, f16 use case: generic classification of cast/reply, in-thread semantic search, expected ~200gb db load at int8 precision or ~1.2tb at f32.
3 replies
1 recast
17 reactions
not parzival
@shoni.eth
text per sec not tokens per sec my bad
1 reply
0 recast
7 reactions
not parzival
@shoni.eth
text is i.e. "a whole string of text for embedding"
0 reply
0 recast
1 reaction