mikachip
@mikachip
Interesting to finally get a glimpse into the inner workings of GPT-4. TL;DR: GPT-4 is made up of 16 'expert' models, each of which are ~110B parameters and make for ~1.8 trillion total parameters (more than 10x the 175B parameters of GPT-3.5). https://www.semianalysis.com/p/gpt-4-architecture-infrastructure
1 reply
0 recast
0 reaction
mikachip
@mikachip
Perhaps most interesting is how they've been able to keep inference cost manageable (relative to inference on the full 1.8T parameters for every query). Each forward pass inference uses only ~280B parameters: ~55B parameters for routing and 2x ~110B parameters for the two 'experts' it's routed to.
0 reply
0 recast
0 reaction