Training language models to follow instructions with human feedback

AI-generated keywords: InstructGPT Human Feedback Language Models Alignment Fine-Tuning

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Simply making language models bigger does not necessarily make them better at following a user's intent.
  • Large language models can generate outputs that are untruthful, toxic, or not helpful to the user.
  • The authors propose an approach for aligning language models with user intent on various tasks by fine-tuning them with human feedback.
  • They collect a dataset of labeler demonstrations of desired model behavior and use it to fine-tune GPT-3 using supervised learning.
  • They also collect a dataset of rankings of model outputs and further fine-tune this supervised model using reinforcement learning from human feedback, resulting in InstructGPT models.
  • Human evaluations show that outputs from the 1.3B parameter InstructGPT model are preferred over those from the 175B GPT-3 despite having fewer parameters.
  • InstructGPT models exhibit improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets.
  • Fine-tuning with human feedback is a promising direction for aligning language models with human intent on various tasks.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe

Abstract: Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use to further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT. In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters. Moreover, InstructGPT models show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Even though InstructGPT still makes simple mistakes, our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent.

Submitted to arXiv on 04 Mar. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2203.02155v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

The paper titled "Training language models to follow instructions with human feedback" highlights that simply making language models bigger does not necessarily make them better at following a user's intent. Large language models can generate outputs that are untruthful, toxic, or not helpful to the user, indicating misalignment with their intended purpose. The authors propose an approach for aligning language models with user intent on various tasks by fine-tuning them with human feedback. They collect a dataset of labeler demonstrations of desired model behavior and use it to fine-tune GPT-3 using supervised learning. They also collect a dataset of rankings of model outputs and further fine-tune this supervised model using reinforcement learning from human feedback, resulting in InstructGPT models. Human evaluations show that outputs from the 1.3B parameter InstructGPT model are preferred over those from the 175B GPT-3 despite having fewer parameters. Additionally, InstructGPT models exhibit improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Although InstructGPT still makes simple mistakes, the results indicate that fine-tuning with human feedback is a promising direction for aligning language models with human intent on various tasks.
Created on 10 Apr. 2023

Assess the quality of the AI-generated content by voting

Score: -1

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.