Yhprum
@yhprumslaw
no clue if I’m right but guessing from reading this that it’s some type of tensor splitting where a model is shared across nodes. no one holds the full weight set—each gets a tensor chunk, like a weight matrix slice, cutting comms to 1-5MB/step over slow nets.
1 reply
0 recast
0 reaction
Yhprum
@yhprumslaw
tensor splitting divides big arrays (e.g., a 1B-param layer) into sub-tensors. node a might get rows 1-100, node b 101-200, etc., slashing data sent between them to fit low bandwidth.
1 reply
0 recast
0 reaction
Yhprum
@yhprumslaw
maybe then prediction kicks in: each node guesses its neighbors’ tensors with a small model (e.g., LSTM), trained on past data. Syncs happen only when errors spike, saving bandwidth further.
1 reply
0 recast
0 reaction
Yhprum
@yhprumslaw
over a 10Mbps net, full tensor syncs (100MB) crawl, but sharding + prediction drops it to 1MB bursts. training stays fast as nodes compute locally, syncing less often… such a fascinating idea between distributed systems and the idea that open source will be won by AI
1 reply
0 recast
0 reaction
Yhprum
@yhprumslaw
still thinking about this more… to further minimize communication, nodes could predict incoming tensors (e.g., activations or gradients from other shards) using lightweight models, like an lstm or small transformer, trained on historical patterns. long doesn’t explicitly mention prediction in the blog, but my guess is this aligns with solving the “low-bandwidth bottleneck” he highlights. syncs would occur only when predictions diverge significantly, cutting comms frequency. would be interested to get his input if you know his handle @fredwilson.eth
1 reply
0 recast
0 reaction
Yhprum
@yhprumslaw
would models be delivered split? no full weight set exists - nodes host shards (e.g., tensor slices). delivery is inference via a network api, querying shards live over the internet. so users call an endpoint, like ‘api.pluralis.ai/modelX’. the system routes inputs to shard-hosting nodes, each processing its piece. outputs merge and return, all without centralizing the model?
1 reply
0 recast
0 reaction