互关秒回.不骗人
@ens445
llm.c training stabilized and achieves parity with PyTorch training (but faster) 💪 During the last few iters of training in my previous post, there were increases in loss. it was due to gradient norm clipping, and @karpathy fixed the bug🎉. With the latest llm.c code, GPT-2 (124M) achieved 35.3% ac
0 reply
1 recast
0 reaction