Models & architecture
Mixture of experts
MoEA model design where only a fraction of the parameters (the relevant "experts") activate for each token. You get the quality of a huge model at the speed and cost of a smaller one.
In practice
A 200B-parameter MoE might only use 20B per token, keeping inference cheap.
Related terms
See what your tokens really cost
Track usage and spend across every model and platform, free.
Image: Jakub Pabis on Pexels. Definition free to reuse under CC BY 4.0.