IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization

AI-generated keywords: Large Language Models Complex Instruction Following Benchmarking IOPO Alignment Method Empowering LLMs

AI-generated Key Points

  • Large Language Models (LLMs) require the ability to accurately follow complex instructions for various tasks
  • TRACE benchmark introduced to enhance and evaluate LLMs' instruction-following capabilities
  • TRACE includes 120K training data points and 1K evaluation data points for comprehensive testing
  • IOPO alignment method proposed to improve how LLMs align with response preferences and explore instruction preferences in detail
  • Extensive experiments show significant improvements with IOPO, including 8.15% and 2.18% enhancements on in-domain data, as well as 6.29% and 3.13% improvements on out-of-domain data compared to existing methods
  • Research contributes valuable insights into empowering LLMs with enhanced complex instruction following abilities through innovative benchmarking and alignment techniques like TRACE and IOPO
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Xinghua Zhang, Haiyang Yu, Cheng Fu, Fei Huang, Yongbin Li

ACL 2025
License: CC BY 4.0

Abstract: In the realm of large language models (LLMs), the ability of models to accurately follow instructions is paramount as more agents and applications leverage LLMs for construction, where the complexity of instructions are rapidly increasing. However, on the one hand, there is only a certain amount of complex instruction evaluation data; on the other hand, there are no dedicated algorithms to improve the ability to follow complex instructions. To this end, this paper introduces TRACE, a benchmark for improving and evaluating the complex instructionfollowing ability, which consists of 120K training data and 1K evaluation data. Furthermore, we propose IOPO (Input-Output Preference Optimization) alignment method which takes both input and output preference pairs into consideration, where LLMs not only rapidly align with response preferences but also meticulously explore the instruction preferences. Extensive experiments on both in-domain and outof-domain datasets confirm the effectiveness of IOPO, showing 8.15%, 2.18% improvements on in-domain data and 6.29%, 3.13% on outof-domain data compared to SFT and DPO respectively.

Submitted to arXiv on 09 Nov. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2411.06208v3

In the realm of Large Language Models (LLMs), the ability to accurately follow complex instructions is becoming increasingly important as more agents and applications rely on LLMs for various tasks. With the complexity of instructions rapidly evolving, there is a growing need to enhance the instruction-following capabilities of these models. To address this gap, this paper introduces TRACE, a benchmark designed to improve and evaluate the ability of LLMs to follow complex instructions. The benchmark includes 120K training data points and 1K evaluation data points for comprehensive testing. Additionally, the paper proposes the IOPO (Input-Output Preference Optimization) alignment method, which considers both input and output preference pairs in order to enhance how LLMs align with response preferences and explore instruction preferences in detail. Extensive experiments conducted on both in-domain and out-of-domain datasets demonstrate the effectiveness of IOPO. The results show significant improvements compared to existing methods, with 8.15% and 2.18% enhancements on in-domain data, as well as 6.29% and 3.13% improvements on out-of-domain data when compared to SFT and DPO alignment methods respectively. Overall, this research contributes valuable insights into empowering LLMs with enhanced complex instruction following abilities through innovative benchmarking and alignment techniques like TRACE and IOPO. These advancements have the potential to significantly improve the performance of LLMs in handling intricate tasks based on complex instructions in various real-world applications.
Created on 22 Jul. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.