Performance
Throughput
How much work a model can do per unit of time, such as total tokens served per second across all requests. Matters most at scale.
In practice
High throughput lets you serve thousands of users on the same model at once.
Related terms
See what your tokens really cost
Track usage and spend across every model and platform, free.
Image: Mathias Reding on Pexels. Definition free to reuse under CC BY 4.0.