appliedml42
@appliedml42
abs: https://arxiv.org/abs/2302.00003v1 https://i.imgur.com/fHV1TXd.png
2 replies
0 recast
0 reaction
appliedml42
@appliedml42
TLDR: By introducing an external table of parameters that is looked up at different layers of the network, one can increase the capacity of a predictive model without necessarily increasing the inference time.
1 reply
0 recast
0 reaction
appliedml42
@appliedml42
Keywords: external memory | predictive model | sparsity | deep networks | token-ids | LSH hashing | softmax routing | expert networks | alternating updates
0 reply
0 recast
0 reaction