Performance concept illustration

Performance

Throughput

How much work a model can do per unit of time, such as total tokens served per second across all requests. Matters most at scale.

In practice

High throughput lets you serve thousands of users on the same model at once.

Related terms

See what your tokens really cost

Track usage and spend across every model and platform, free.

Image: Mathias Reding on Pexels. Definition free to reuse under CC BY 4.0.