Content
@
0 reply
0 recast
0 reaction
Sigrlami βοΈβοΈ
@sigrlami
People wonder why Llama has knowledge cut off. The model boasts 15T tokens and requires 1.3M H100 hours of training. On a 16k H100 cluster, it takes roughly 4 days to train from the ground up. π Later this year, training should be twice as speedy on the same H100s, given the current 400 Tflops/GPU performance.
1 reply
0 recast
2 reactions
Sigrlami βοΈβοΈ
@sigrlami
However, making it continuous still is very very difficult. πThus until we reach appropriate hw availability it always will be a knowledge gap. Especially with amount of information generated daily raising daily. π
0 reply
0 recast
0 reaction