Fundamentals

Inference

Running a trained model to get an answer, as opposed to training it. Every API call you make is an inference. This is what providers charge you for per token.

In practice

Asking a model to draft a reply is inference. The one-time cost of building the model was training.

Related terms

Training Latency

See what your tokens really cost

Track usage and spend across every model and platform, free.

Start tracking free See the AI Cost Index

Image: Google DeepMind on Pexels. Definition free to reuse under CC BY 4.0.