Nous Research has released a preliminary report on DisTrO (Distributed Training Over-the-Internet), a set of distributed optimizers that drastically reduces inter-GPU communication by 1000x to 10,000x without relying on amortized analysis, while maintaining the convergence rates of AdamW+All-Reduce. 

decentralized training—we are so back! the last step is distributed tensor processing, allowing anyone to contribute with consumer GPUs.

Nous Research has released a preliminary report on DisTrO (Distributed Training Over-the-Internet), a set of distributed optimizers that drastically reduces inter-GPU communication by 1000x to 10,000x without relying on amortized analysis, while maintaining the convergence rates of AdamW+All-Reduce. 

decentralized training—we are so back! the last step is distributed tensor processing, allowing anyone to contribute with consumer GPUs. 

https://github.com/NousResearch/DisTrO/blob/main/A_Preliminary_Report_on_DisTrO.pdf

excited for Proof-of-Compute style networks to emerge from stuff like this. those will be the real open source models