Training & tuning concept illustration

Training & tuning

Quantization

Shrinking a model by storing its weights at lower numeric precision, which cuts memory and speeds up inference with a small quality trade-off.

In practice

A quantized model runs on a laptop that could never fit the full-precision version.

Related terms

See what your tokens really cost

Track usage and spend across every model and platform, free.

Image: panumas nikhomkhai on Pexels. Definition free to reuse under CC BY 4.0.