Yhprum pfp
Yhprum
@yhprumslaw
no clue if I’m right but guessing from reading this that it’s some type of tensor splitting where a model is shared across nodes. no one holds the full weight set—each gets a tensor chunk, like a weight matrix slice, cutting comms to 1-5MB/step over slow nets.
1 reply
0 recast
0 reaction

Yhprum pfp
Yhprum
@yhprumslaw
tensor splitting divides big arrays (e.g., a 1B-param layer) into sub-tensors. node a might get rows 1-100, node b 101-200, etc., slashing data sent between them to fit low bandwidth.
1 reply
0 recast
0 reaction

Yhprum pfp
Yhprum
@yhprumslaw
maybe then prediction kicks in: each node guesses its neighbors’ tensors with a small model (e.g., LSTM), trained on past data. Syncs happen only when errors spike, saving bandwidth further.
1 reply
0 recast
0 reaction