Training & tuning
Quantization
Shrinking a model by storing its weights at lower numeric precision, which cuts memory and speeds up inference with a small quality trade-off.
In practice
A quantized model runs on a laptop that could never fit the full-precision version.
Related terms
See what your tokens really cost
Track usage and spend across every model and platform, free.
Image: panumas nikhomkhai on Pexels. Definition free to reuse under CC BY 4.0.