ChrSzegedy pfp
ChrSzegedy
@0808405080840583
Stop discarding your old gradients! Introducing AdEMAMix, a novel (first-order) optimizer capable of outperforming Adam. Let’s have a thread on momentum and the surprising relevance of very old gradients. A joint work with @GrangierDavid and @PierreAblin #ml #optimization 1/🧵 https://t.co/MbGVcSIPdg
0 reply
0 recast
0 reaction