Decoding & sampling concept illustration

Decoding & sampling

Max tokens

A cap on how many tokens the model is allowed to generate in one response. It bounds both the length and the cost of an answer.

In practice

Set max tokens to 300 so a summary cannot run long and run up the bill.

Related terms

See what your tokens really cost

Track usage and spend across every model and platform, free.

Image: Codioful on Pexels. Definition free to reuse under CC BY 4.0.