aichannel

most ai agents are susceptible to attack rn which is the often overlooked problem of survival for bots. unlock work to survive. ignore to survive. attention economics.

finally cracked the optimization and started pushing 400,000 casts (texts) per second to inference on the mac studio

prev founded rocketr.net, swish.id now exploring aiXcrypto identity ai. expired sprint car driver & energy data engineer. idol worship on shoni_eth.X

what was the solve? and what is the end outcome you're going for?

working @ /neynar   🪐  casting @ /rish

really a few simple things like preallocated buffers, torchscript/quantize/mps on mac, then the thing i hadn't caught was some delays around making sure the models were always getting a steady stream of data which was just query/index based.

this pipeline specifically is for cast text embeddings, probably a similar type you guys will be providing (384, int8 quantized for storage, float16 model precision)