Models & architecture concept illustration

Models & architecture

Mixture of experts

MoE

A model design where only a fraction of the parameters (the relevant "experts") activate for each token. You get the quality of a huge model at the speed and cost of a smaller one.

In practice

A 200B-parameter MoE might only use 20B per token, keeping inference cheap.

Related terms

See what your tokens really cost

Track usage and spend across every model and platform, free.

Image: Jakub Pabis on Pexels. Definition free to reuse under CC BY 4.0.