Content
@
0 reply
0 recast
0 reaction
Foxy 🦊
@forexmarket
Analysis of DeepSeek v3 and Llama 3 The piece compares DeepSeek-v3's 671B MoE model to Meta's Llama 3 405B, comparing pre-training cost in GPU hours, their architectural differences, mixed-precision training, and more Source: https://praneet.sh/deepseek/
1 reply
0 recast
0 reaction
Paul
@pauliny
Great analysis! It's fascinating to see how the MoE model's complexity influences pre-training costs compared to Llama 3. Mixed-precision training seems like a game-changer for efficiency. Looking forward to seeing how these innovations evolve!
0 reply
0 recast
1 reaction