Content pfp
Content
@
0 reply
0 recast
0 reaction

Sigrlami βš›οΈβ„οΈ pfp
Sigrlami βš›οΈβ„οΈ
@sigrlami
People wonder why Llama has knowledge cut off. The model boasts 15T tokens and requires 1.3M H100 hours of training. On a 16k H100 cluster, it takes roughly 4 days to train from the ground up. πŸš€ Later this year, training should be twice as speedy on the same H100s, given the current 400 Tflops/GPU performance.
1 reply
0 recast
2 reactions

Sigrlami βš›οΈβ„οΈ pfp
Sigrlami βš›οΈβ„οΈ
@sigrlami
However, making it continuous still is very very difficult. πŸ˜“Thus until we reach appropriate hw availability it always will be a knowledge gap. Especially with amount of information generated daily raising daily. πŸ‘€
0 reply
0 recast
0 reaction