latentspacepod
@7843343784334340
NVIDIA presents Upcycling Large Language Models into Mixture of Experts Finds that upcycling outperforms continued dense model training based on large-scale experiments using Nemotron-4 15B trained on 1T tokens https://t.co/lKEtbMeQX8 https://t.co/L4LiEKrWDm
0 reply
0 recast
0 reaction