Content pfp
Content
@
0 reply
0 recast
0 reaction

𝚐𝔪𝟾𝚡𝚡𝟾 pfp
𝚐𝔪𝟾𝚡𝚡𝟾
@gm8xx8
Nous Research has released a preliminary report on DisTrO (Distributed Training Over-the-Internet), a set of distributed optimizers that drastically reduces inter-GPU communication by 1000x to 10,000x without relying on amortized analysis, while maintaining the convergence rates of AdamW+All-Reduce. decentralized training—we are so back! the last step is distributed tensor processing, allowing anyone to contribute with consumer GPUs. https://github.com/NousResearch/DisTrO/blob/main/A_Preliminary_Report_on_DisTrO.pdf
1 reply
0 recast
9 reactions

Tyler Cosgrove pfp
Tyler Cosgrove
@tylercosgrove
excited for Proof-of-Compute style networks to emerge from stuff like this. those will be the real open source models
0 reply
0 recast
0 reaction