Decoding & sampling
Max tokens
A cap on how many tokens the model is allowed to generate in one response. It bounds both the length and the cost of an answer.
Decoding & sampling
A cap on how many tokens the model is allowed to generate in one response. It bounds both the length and the cost of an answer.