no clue if I’m right but guessing from reading this that it’s some type of tensor splitting where a model is shared across nodes. no one holds the full weight set—each gets a tensor chunk, like a weight matrix slice, cutting comms to 1-5MB/step over slow nets.

everything that can work, will work | unpacking papers, musings on mathematics, and meditations on craft from an anon founder (@yhprums_law on X)

tensor splitting divides big arrays (e.g., a 1B-param layer) into sub-tensors. node a might get rows 1-100, node b 101-200, etc., slashing data sent between them to fit low bandwidth.

maybe then prediction kicks in: each node guesses its neighbors’ tensors with a small model (e.g., LSTM), trained on past data. Syncs happen only when errors spike, saving bandwidth further.

over a 10Mbps net, full tensor syncs (100MB) crawl, but sharding + prediction drops it to 1MB bursts. training stays fast as nodes compute locally, syncing less often… such a fascinating idea between distributed systems and the idea that open source will be won by AI

would models be delivered split? no full weight set exists - nodes host shards (e.g., tensor slices). delivery is inference via a network api, querying shards live over the internet.

so users call an endpoint, like ‘api.pluralis.ai/modelX’. the system routes inputs to shard-hosting nodes, each processing its piece. outputs merge and return, all without centralizing the model?

nodes earn based on shard use - e.g., micro-payments per query. smart contracts track this, ensuring trainers profit. no one downloads the full model; it’s unmaterializable by design

low-bandwidth tricks (e.g., prediction, compression) speed inference. a 1T-param model split across 1000 nodes runs fast, serving users globally from laptops to servers.

this feels like true open-source ai: anyone can use frontier models via api, no big tech gatekeepers. shards stay distributed, keeping access universal and power decentralized.

no clue though, just guessing curious if anyone from @usv has any thoughts…

still thinking about this more… to further minimize communication, nodes could predict incoming tensors (e.g., activations or gradients from other shards) using lightweight models, like an lstm or small transformer, trained on historical patterns. 

long doesn’t explicitly mention prediction in the blog, but my guess is this aligns with solving the “low-bandwidth bottleneck” he highlights. syncs would occur only when predictions diverge significantly, cutting comms frequency. would be interested to get his input if you know his handle @fredwilson.eth