Content
@
https://warpcast.com/~/channel/base-seoul
0 reply
0 recast
0 reaction
Luuu
@luuu
#dailychallenge MoE - Mixture of Expert - Neural network architecture. Split LLM to multiple small experts, organize a network of those experts, and activate only a specific subset of experts based on the input. This approach helps improve efficiency by reducing computational cost and still maintaining high performance - to sum up, it can - make the inference cheaper - and make the performance higher - However, - if gating is inefficient, it won’t perform better - when specific experts are overused(Load balancing issue), training becomes imbalanced(Mode collapse issue) - In a distributed computing environment, MoE increases inter-device communication, resulting in performance slow-down. - it is harder to train than a general transformer model - Due to these features, CoT goes well with MoE
0 reply
0 recast
1 reaction