Turning Dross Into Gold Loss: is BERT4Rec really better than SASRec?

AI-generated keywords: SASRec BERT4Rec Cross-Entropy Negative Sampling Performance

AI-generated Key Points

  • Sequential recommendations and next-item prediction tasks are popular in recommender systems.
  • SASRec and BERT4Rec are state-of-the-art Transformer-based models for these tasks.
  • Previous studies consistently show that BERT4Rec performs better than SASRec.
  • The key difference between the models is their loss functions: BERT4Rec uses cross-entropy over softmax, while SASRec uses negative sampling with binary cross-entropy loss.
  • This work investigates the impact of using the same loss function as BERT4Rec on SASRec's performance.
  • Surprisingly, experiments show that when trained with BERT4Rec's loss function, SASRec outperforms BERT4Rec in terms of quality and training speed.
  • This challenges the prevailing notion that BERT4Rec is superior to SASRec.
  • Increasing the number of negative examples used during training improves SASrec's performance compared to BERT4rec.
  • Negative sampling can be an effective strategy for training SASrec.
  • Overall, these findings demonstrate that by using the same loss function as BERT4rec or employing negative sampling with more negative examples, SASrec achieves better performance in terms of quality and training speed.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Anton Klenitskiy, Alexey Vasilev

License: CC BY 4.0

Abstract: Recently sequential recommendations and next-item prediction task has become increasingly popular in the field of recommender systems. Currently, two state-of-the-art baselines are Transformer-based models SASRec and BERT4Rec. Over the past few years, there have been quite a few publications comparing these two algorithms and proposing new state-of-the-art models. In most of the publications, BERT4Rec achieves better performance than SASRec. But BERT4Rec uses cross-entropy over softmax for all items, while SASRec uses negative sampling and calculates binary cross-entropy loss for one positive and one negative item. In our work, we show that if both models are trained with the same loss, which is used by BERT4Rec, then SASRec will significantly outperform BERT4Rec both in terms of quality and training speed. In addition, we show that SASRec could be effectively trained with negative sampling and still outperform BERT4Rec, but the number of negative examples should be much larger than one.

Submitted to arXiv on 14 Sep. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2309.07602v1

Recently, sequential recommendations and next-item prediction tasks have gained popularity in the field of recommender systems. Two state-of-the-art baselines for these tasks are SASRec and BERT4Rec, which are Transformer-based models. Previous publications comparing these algorithms have consistently shown that BERT4Rec outperforms SASRec in terms of performance. However, there is a key difference in their loss functions: BERT4Rec uses cross-entropy over softmax for all items, while SASRec uses negative sampling and calculates binary cross-entropy loss for one positive and one negative item. In this work, we aim to investigate the impact of using the same loss function as BERT4Rec on the performance of SASRec. Surprisingly, our experiments reveal that when both models are trained with the loss used by BERT4Rec, SASRec significantly outperforms BERT4Rec in terms of quality and training speed. This finding challenges the prevailing notion that BERT4Rec is superior to SASRec. Furthermore, we explore the effectiveness of training SASRec with negative sampling while still surpassing BERT4rec's performance. We find that increasing the number of negative examples used during training improves SASrec's performance compared to BERT4rec. This suggests that negative sampling can be an effective strategy for training SASrec. Overall, our findings demonstrate that by using the same loss function as BERT4rec or employing negative sampling with an increased number of negative examples, SASrec can achieve better performance than its counterpart both in terms of quality and training speed. These results contribute to a deeper understanding of sequential recommendation models and provide insights into improving their performance.
Created on 24 Oct. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.