Training language models to follow instructions with human feedback

Date:

This paper show a method to align language models with user intent on a wide range of tasks by fine-tuning with human feedback.

With using labeler-written prompts and prompts submitted through the OpenAI API and a dataset of labeler demonstrations of the desired model behavior. Such that they can fine-tune GPT-3 using supervised learning.

And then collect a dataset of rankings of model outputs, which they use to further fine-tune this supervised model using reinforcement learning from human feedback, the model called InstructGPT.

Powerpoint for this talk

Powerpoint for this talk

Leave a Comment