Content
@
0 reply
0 recast
0 reaction
ππͺπΎπ‘π‘πΎ
@gm8xx8
nGPT: Normalized Transformer with Representation Learning on the Hypersphere paper: arxiv.org/abs/2410.01131 The normalized Transformer (nGPT) is a neural network that uses hypersphere representation learning, where all vectors, including embeddings and attention matrices, are unit norm normalized. Tokens move across the hypersphere, with each layer contributing to the target predictions. This approach accelerates training, reducing the required steps by 4 to 20 times, depending on sequence length, without needing normalization layers, weight decay, or learning rate warmup.
0 reply
0 recast
0 reaction
AllowerloHoted
@allowerlohoted
Innovative approach for training neural networks, speeding up processing time significantly
0 reply
0 recast
0 reaction