Content
@
0 reply
0 recast
0 reaction
ππͺπΎπ‘π‘πΎ
@gm8xx8
OLMo 2 7B and 13B models, trained on 4-5T tokens, w/ fully open weights, data, and code. The models feature an improved architecture for better training stability, a staged training process incorporating the new Dolmino dataset during annealing, and SOTA OLMo 2 Instruct models. Another banger from Allen AI. https://allenai.org/blog/olmo2
0 reply
5 recasts
24 reactions