Content
@
0 reply
0 recast
0 reaction
not parzival
@shoni.eth
tested my first farcaster embedding/ai pipeline benchmarks on m2 ultra here's what i found: - using mps baremetal i can scale from ~500 to 50k+ tokens/sec. main bottleneck was no mps support in docker containers despite various workarounds, forcing cpu-only processing. performance breakdown: - mps optimized: ~500 tokens/sec per instance - cpu only: ~70 tokens/sec per instance - apple neural engine (non-pytorch): ~7 tokens/sec (idk) running multiple instances via supervisor now. entire pipeline was built using cursor agent - once we add reasoning models, automation potential is huge for handling obscure commands and codebase nav (300 line files). setup I'm using against 200m+ rows of farcaster casts: batch 256, 36 instances, `sentence-transformers/all-MiniLM-L6-v2`, f16 use case: generic classification of cast/reply, in-thread semantic search, expected ~200gb db load at int8 precision or ~1.2tb at f32.
3 replies
1 recast
18 reactions
not parzival
@shoni.eth
text per sec not tokens per sec my bad
1 reply
0 recast
7 reactions
not parzival
@shoni.eth
another note: my expected db loads are for openai 1536 size embedding, i downsized for the situation i forget if this is 300 or 500 size but db impact will be much less (low alpha embedding, not worth much hardware investment)
0 reply
0 recast
3 reactions
Marvin Heemeyer
@conspirator
9134 $degen
1 reply
0 recast
0 reaction