Training & tuning

Reinforcement learning from human feedback

Q: What does RLHF stand for?

RLHF stands for Reinforcement learning from human feedback. A tuning method where humans rank model outputs and the model learns to prefer the highly-rated ones. It is a big reason chat models feel helpful and polite.

RLHF

A tuning method where humans rank model outputs and the model learns to prefer the highly-rated ones. It is a big reason chat models feel helpful and polite.

In practice

RLHF teaches a model to refuse harmful requests and answer the way people prefer.

Related terms

Alignment Instruct model

See what your tokens really cost

Track usage and spend across every model and platform, free.

Start tracking free See the AI Cost Index

Image: panumas nikhomkhai on Pexels. Definition free to reuse under CC BY 4.0.