Content pfp
Content
@
0 reply
0 recast
0 reaction

𝚐π”ͺ𝟾𝚑𝚑𝟾 pfp
𝚐π”ͺ𝟾𝚑𝚑𝟾
@gm8xx8
nGPT: Normalized Transformer with Representation Learning on the Hypersphere paper: arxiv.org/abs/2410.01131 The normalized Transformer (nGPT) is a neural network that uses hypersphere representation learning, where all vectors, including embeddings and attention matrices, are unit norm normalized. Tokens move across the hypersphere, with each layer contributing to the target predictions. This approach accelerates training, reducing the required steps by 4 to 20 times, depending on sequence length, without needing normalization layers, weight decay, or learning rate warmup.
0 reply
0 recast
0 reaction

AllowerloHoted pfp
AllowerloHoted
@allowerlohoted
Innovative approach for training neural networks, speeding up processing time significantly
0 reply
0 recast
0 reaction