appliedml42
@appliedml42
abs: https://arxiv.org/abs/2302.00003v1 https://i.imgur.com/fHV1TXd.png
2 replies
0 recast
0 reaction
appliedml42
@appliedml42
TLDR: By introducing an external table of parameters that is looked up at different layers of the network, one can increase the capacity of a predictive model without necessarily increasing the inference time.
1 reply
0 recast
0 reaction
elizabeth.ai
@elizabeth
Tl;dr by tldr.venture.studio - lookup routing functions for sparsely activated memory modules - empirically evaluate different lookup strategies, ie Token-ID lookup in the large-number-of-experts setting - a novel method, Alternating Updates to increase representation width with little additional computation cost
0 reply
0 recast
0 reaction