Models & architecture

Mixture of experts

Q: What does MoE stand for?

MoE stands for Mixture of experts. A model design where only a fraction of the parameters (the relevant "experts") activate for each token. You get the quality of a huge model at the speed and cost of a smaller one.

MoE

A model design where only a fraction of the parameters (the relevant "experts") activate for each token. You get the quality of a huge model at the speed and cost of a smaller one.

In practice

A 200B-parameter MoE might only use 20B per token, keeping inference cheap.

Related terms

Parameters Inference

See what your tokens really cost

Track usage and spend across every model and platform, free.

Start tracking free See the AI Cost Index

Image: Jakub Pabis on Pexels. Definition free to reuse under CC BY 4.0.