Content pfp
Content
@
0 reply
0 recast
0 reaction

JB Rubinovitz ⌐◨-◨ pfp
JB Rubinovitz ⌐◨-◨
@rubinovitz
“DisTrO (Distributed Training Over-the-Internet) a family of architecture-agnostic and network-agnostic distributed optimizers that reduces the inter-GPU communication requirements by 1000x to 10,000x without relying on amortized analysis, and matches AdamW+All-Reduce in convergence rates. This enables low-latency training of large neural networks on slow internet bandwidths with heterogeneous networking hardware.” https://github.com/NousResearch/DisTrO/blob/main/A_Preliminary_Report_on_DisTrO.pdf has issues loading on mobile
1 reply
0 recast
11 reactions

Nadav pfp
Nadav
@nadav
love Nous. should we get those guys in here?
2 replies
0 recast
4 reactions