OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment

AI-generated keywords: OneRec

AI-generated Key Points

OneRec is a unified end-to-end generative framework for single-stage recommendation
Incorporates encoder-decoder architecture and scales up model parameters based on sparse Mixture-of-Experts (MoE) structure
Adopts session-wise list generation approach considering content and order of items within each session
Explores preference learning through direct preference optimization (DPO) using self-hard rejected samples from beam search results
Implements Iterative Preference Alignment (IPA) strategy to rank sampled responses based on pre-trained reward model scores
Extensive experiments demonstrate superiority in generating high-quality recommendations
Successfully deployed in Kuaishou, resulting in a significant 1.6% increase in watch-time

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Jiaxin Deng, Shiyao Wang, Kuo Cai, Lejian Ren, Qigen Hu, Weifeng Ding, Qiang Luo, Guorui Zhou

arXiv: 2502.18965v1 - DOI (cs.IR)

License: CC BY 4.0

Abstract: Recently, generative retrieval-based recommendation systems have emerged as a promising paradigm. However, most modern recommender systems adopt a retrieve-and-rank strategy, where the generative model functions only as a selector during the retrieval stage. In this paper, we propose OneRec, which replaces the cascaded learning framework with a unified generative model. To the best of our knowledge, this is the first end-to-end generative model that significantly surpasses current complex and well-designed recommender systems in real-world scenarios. Specifically, OneRec includes: 1) an encoder-decoder structure, which encodes the user's historical behavior sequences and gradually decodes the videos that the user may be interested in. We adopt sparse Mixture-of-Experts (MoE) to scale model capacity without proportionally increasing computational FLOPs. 2) a session-wise generation approach. In contrast to traditional next-item prediction, we propose a session-wise generation, which is more elegant and contextually coherent than point-by-point generation that relies on hand-crafted rules to properly combine the generated results. 3) an Iterative Preference Alignment module combined with Direct Preference Optimization (DPO) to enhance the quality of the generated results. Unlike DPO in NLP, a recommendation system typically has only one opportunity to display results for each user's browsing request, making it impossible to obtain positive and negative samples simultaneously. To address this limitation, We design a reward model to simulate user generation and customize the sampling strategy. Extensive experiments have demonstrated that a limited number of DPO samples can align user interest preferences and significantly improve the quality of generated results. We deployed OneRec in the main scene of Kuaishou, achieving a 1.6\% increase in watch-time, which is a substantial improvement.

Submitted to arXiv on 26 Feb. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2502.18965v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

, , , , This paper introduces OneRec, a unified end-to-end generative framework for single-stage recommendation, designed to address the challenges faced by traditional recommendation systems. Inspired by the scaling laws observed in training large language models, OneRec incorporates an encoder-decoder architecture and scales up its model parameters based on a sparse Mixture-of-Experts (MoE) structure. This allows OneRec to effectively capture user interests. Unlike conventional point-by-point prediction methods, OneRec adopts a session-wise list generation approach that considers both the relative content and order of items within each session. This eliminates the need for hand-crafted strategies and allows the model to autonomously learn optimal session structures from input data. To further improve the quality of generated recommendations, OneRec explores preference learning through direct preference optimization (DPO). To construct preference pairs, self-hard rejected samples are created from beam search results instead of random sampling. An Iterative Preference Alignment (IPA) strategy ranks sampled responses based on scores provided by a pre-trained reward model (RM), identifying the best-chosen and worst-rejected samples. Extensive experiments conducted on large-scale industry datasets demonstrate the superiority of OneRec in generating high-quality recommendations. The framework was successfully deployed in Kuaishou, a popular short video recommendation platform with millions of daily active users, resulting in a significant 1.6% increase in watch-time. Overall, OneRec represents a novel approach to recommendation systems that combines advanced generative modeling techniques with innovative preference alignment strategies to deliver superior performance in real-world scenarios.

- OneRec is a unified end-to-end generative framework for single-stage recommendation
- Incorporates encoder-decoder architecture and scales up model parameters based on sparse Mixture-of-Experts (MoE) structure
- Adopts session-wise list generation approach considering content and order of items within each session
- Explores preference learning through direct preference optimization (DPO) using self-hard rejected samples from beam search results
- Implements Iterative Preference Alignment (IPA) strategy to rank sampled responses based on pre-trained reward model scores
- Extensive experiments demonstrate superiority in generating high-quality recommendations
- Successfully deployed in Kuaishou, resulting in a significant 1.6% increase in watch-time

SummaryOneRec is a special tool that helps suggest things for you to watch or use. It uses a special way of organizing information and making decisions based on different experts. It looks at what you like and how things are arranged before suggesting something to you. It tries hard to understand your preferences and improve its suggestions over time. Many tests show that it is very good at giving recommendations, and it has been used successfully in a popular app, helping people watch more videos. Definitions- Unified: Bringing together different parts into one system. - Generative: Creating or producing something new. - Framework: A structure or set of rules for doing something. - Recommendation: Suggesting something that might be useful or interesting. - Encoder-decoder architecture: A method of converting information from one form to another. - Sparse Mixture-of-Experts (MoE) structure: Using a combination of specialized individuals to make decisions. - Session-wise list generation approach: Creating a list based on how items are grouped together in sessions. - Preference learning: Understanding what someone likes or prefers. - Direct preference optimization (DPO): Improving suggestions by focusing on individual choices. - Self-hard rejected samples: Items that were not chosen during the decision-making process. - Beam search results: A method used in finding solutions by exploring different paths simultaneously. - Iterative Preference Alignment (IPA) strategy: Adjusting suggestions based on feedback received over time. - Pre-trained reward model scores: Ratings given to suggestions based on

Introduction

Recommendation systems have become an integral part of our daily lives, helping us discover new products, services, and content that align with our interests. However, traditional recommendation systems face several challenges such as scalability, personalization, and diversity. To address these issues, a team of researchers from Kuaishou Technology has developed OneRec - a unified end-to-end generative framework for single-stage recommendation. In this blog article, we will dive into the details of the research paper "OneRec: Unified End-to-End Generative Framework for Single-Stage Recommendation" and explore how this innovative approach to recommendation systems can revolutionize the way we receive recommendations.

The Challenges Faced by Traditional Recommendation Systems

Traditional recommendation systems rely on collaborative filtering techniques or matrix factorization methods to generate recommendations. These approaches suffer from scalability issues as they struggle to handle large datasets with millions of users and items. Moreover, they often fail to capture user preferences accurately due to their reliance on past interactions rather than considering current interests. Another challenge faced by traditional recommendation systems is their lack of diversity in recommendations. They tend to recommend similar items repeatedly based on past behavior patterns instead of exploring new options that may align with a user's evolving interests.

The Solution: OneRec Framework

To overcome these challenges, the researchers at Kuaishou Technology developed OneRec - a unified end-to-end generative framework for single-stage recommendation. This framework incorporates an encoder-decoder architecture inspired by scaling laws observed in training large language models like GPT-3. OneRec also adopts a sparse Mixture-of-Experts (MoE) structure that allows it to scale up its model parameters effectively while capturing user interests accurately. This enables OneRec to generate high-quality recommendations even on large-scale industry datasets with millions of users and items.

Session-Wise List Generation Approach

One of the key features of OneRec is its session-wise list generation approach. Unlike traditional point-by-point prediction methods, OneRec considers both the relative content and order of items within each session to generate recommendations. This eliminates the need for hand-crafted strategies and allows the model to autonomously learn optimal session structures from input data.

Preference Learning through Direct Preference Optimization (DPO)

To further improve the quality of generated recommendations, OneRec explores preference learning through direct preference optimization (DPO). This approach involves creating self-hard rejected samples from beam search results instead of random sampling. These samples are then ranked using an Iterative Preference Alignment (IPA) strategy based on scores provided by a pre-trained reward model (RM). The IPA strategy identifies the best-chosen and worst-rejected samples, which are then used to construct preference pairs. By optimizing these preferences directly, OneRec can generate highly personalized and diverse recommendations that align with a user's interests.

Experimental Results

Extensive experiments were conducted on large-scale industry datasets to evaluate the performance of OneRec against other state-of-the-art recommendation systems. The results showed that OneRec outperformed existing methods in terms of accuracy, diversity, and scalability. Moreover, OneRec was successfully deployed in Kuaishou - a popular short video recommendation platform with millions of daily active users. The framework resulted in a significant 1.6% increase in watch-time compared to previous recommendation systems used by Kuaishou.

Conclusion

In conclusion, "OneRec: Unified End-to-End Generative Framework for Single-Stage Recommendation" presents an innovative approach to recommendation systems that combines advanced generative modeling techniques with innovative preference alignment strategies. This framework has shown superior performance in real-world scenarios and has been successfully deployed by Kuaishou Technology. With its ability to handle large-scale datasets, capture user interests accurately, and generate diverse recommendations, OneRec has the potential to revolutionize the way we receive recommendations in various industries. We look forward to seeing how this framework evolves and impacts the field of recommendation systems in the future.

Created on 22 Jul. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Similar papers summarized with our AI tools

61.5%

Recommendation as Instruction Following: A Large Language Model Empowered Rec…

cs.IR

59.3%

Recommendation Unlearning

cs.IR

59.3%

E4SRec: An Elegant Effective Efficient Extensible Solution of Large Language …

cs.IR

59.2%

Towards Large-scale Generative Ranking

cs.IR

58.6%

Page-level Optimization of e-Commerce Item Recommendations

cs.IR

58.5%

A Large Language Model Enhanced Conversational Recommender System

cs.IR

58.4%

Enhancing User Personalization in Conversational Recommenders

cs.IR

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.