OLMo 2 7B and 13B models, trained on 4-5T tokens, w/ fully open weights, data, and code. The models feature an improved architecture for better training stability, a staged training process incorporating the new Dolmino dataset during annealing, and SOTA OLMo 2 Instruct models.

Another banger from Allen AI.

OLMo 2 7B and 13B models, trained on 4-5T tokens, w/ fully open weights, data, and code. The models feature an improved architecture for better training stability, a staged training process incorporating the new Dolmino dataset during annealing, and SOTA OLMo 2 Instruct models.

Another banger from Allen AI. 
https://allenai.org/blog/olmo2