Turning Dross Into Gold Loss: is BERT4Rec really better than SASRec?

AI-generated keywords: SASRec BERT4Rec Cross-Entropy Negative Sampling Performance

AI-generated Key Points

Sequential recommendations and next-item prediction tasks are popular in recommender systems.
SASRec and BERT4Rec are state-of-the-art Transformer-based models for these tasks.
Previous studies consistently show that BERT4Rec performs better than SASRec.
The key difference between the models is their loss functions: BERT4Rec uses cross-entropy over softmax, while SASRec uses negative sampling with binary cross-entropy loss.
This work investigates the impact of using the same loss function as BERT4Rec on SASRec's performance.
Surprisingly, experiments show that when trained with BERT4Rec's loss function, SASRec outperforms BERT4Rec in terms of quality and training speed.
This challenges the prevailing notion that BERT4Rec is superior to SASRec.
Increasing the number of negative examples used during training improves SASrec's performance compared to BERT4rec.
Negative sampling can be an effective strategy for training SASrec.
Overall, these findings demonstrate that by using the same loss function as BERT4rec or employing negative sampling with more negative examples, SASrec achieves better performance in terms of quality and training speed.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Anton Klenitskiy, Alexey Vasilev

arXiv: 2309.07602v1 - DOI (cs.IR)

License: CC BY 4.0

Abstract: Recently sequential recommendations and next-item prediction task has become increasingly popular in the field of recommender systems. Currently, two state-of-the-art baselines are Transformer-based models SASRec and BERT4Rec. Over the past few years, there have been quite a few publications comparing these two algorithms and proposing new state-of-the-art models. In most of the publications, BERT4Rec achieves better performance than SASRec. But BERT4Rec uses cross-entropy over softmax for all items, while SASRec uses negative sampling and calculates binary cross-entropy loss for one positive and one negative item. In our work, we show that if both models are trained with the same loss, which is used by BERT4Rec, then SASRec will significantly outperform BERT4Rec both in terms of quality and training speed. In addition, we show that SASRec could be effectively trained with negative sampling and still outperform BERT4Rec, but the number of negative examples should be much larger than one.

Submitted to arXiv on 14 Sep. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2309.07602v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

Recently, sequential recommendations and next-item prediction tasks have gained popularity in the field of recommender systems. Two state-of-the-art baselines for these tasks are SASRec and BERT4Rec, which are Transformer-based models. Previous publications comparing these algorithms have consistently shown that BERT4Rec outperforms SASRec in terms of performance. However, there is a key difference in their loss functions: BERT4Rec uses cross-entropy over softmax for all items, while SASRec uses negative sampling and calculates binary cross-entropy loss for one positive and one negative item. In this work, we aim to investigate the impact of using the same loss function as BERT4Rec on the performance of SASRec. Surprisingly, our experiments reveal that when both models are trained with the loss used by BERT4Rec, SASRec significantly outperforms BERT4Rec in terms of quality and training speed. This finding challenges the prevailing notion that BERT4Rec is superior to SASRec. Furthermore, we explore the effectiveness of training SASRec with negative sampling while still surpassing BERT4rec's performance. We find that increasing the number of negative examples used during training improves SASrec's performance compared to BERT4rec. This suggests that negative sampling can be an effective strategy for training SASrec. Overall, our findings demonstrate that by using the same loss function as BERT4rec or employing negative sampling with an increased number of negative examples, SASrec can achieve better performance than its counterpart both in terms of quality and training speed. These results contribute to a deeper understanding of sequential recommendation models and provide insights into improving their performance.

- Sequential recommendations and next-item prediction tasks are popular in recommender systems.
- SASRec and BERT4Rec are state-of-the-art Transformer-based models for these tasks.
- Previous studies consistently show that BERT4Rec performs better than SASRec.
- The key difference between the models is their loss functions: BERT4Rec uses cross-entropy over softmax, while SASRec uses negative sampling with binary cross-entropy loss.
- This work investigates the impact of using the same loss function as BERT4Rec on SASRec's performance.
- Surprisingly, experiments show that when trained with BERT4Rec's loss function, SASRec outperforms BERT4Rec in terms of quality and training speed.
- This challenges the prevailing notion that BERT4Rec is superior to SASRec.
- Increasing the number of negative examples used during training improves SASrec's performance compared to BERT4rec.
- Negative sampling can be an effective strategy for training SASrec.
- Overall, these findings demonstrate that by using the same loss function as BERT4rec or employing negative sampling with more negative examples, SASrec achieves better performance in terms of quality and training speed.

Summary1. Recommender systems help suggest things that you might like based on what you have liked before. 2. SASRec and BERT4Rec are two types of models that are really good at making these suggestions. 3. People used to think that BERT4Rec was better, but now they found out that SASRec can be even better. 4. The main difference between the models is how they measure if their suggestions are right or wrong. 5. When SASRec uses the same measurement as BERT4Rec, it actually does a better job. Definitions- Recommender systems: Computer programs that suggest things you might like based on what you have liked before. - Models: Special computer programs designed to solve specific problems or tasks. - Cross-entropy: A way to measure how well a model is doing in making its suggestions. - Softmax: A mathematical function used in cross-entropy to compare different possibilities and choose the best one. - Negative sampling: A technique where the model focuses more on examples it got wrong rather than ones it got right. - Binary cross-entropy loss: Another way to measure how well a model is doing, especially when there are only two choices (right or wrong).

Exploring the Impact of Loss Functions on Sequential Recommendation Models

Recently, recommender systems have become increasingly popular in many industries. To improve user experience and engagement, these systems must be able to accurately predict what items a user may be interested in next. This task is known as sequential recommendation or next-item prediction and has been studied extensively by researchers. Two state-of-the-art models for this task are SASRec and BERT4Rec, which are both Transformer-based models. Previous publications comparing these algorithms have consistently shown that BERT4Rec outperforms SASRec in terms of performance. In this work, we aim to investigate the impact of using the same loss function as BERT4Rec on the performance of SASRec. We also explore the effectiveness of training SASRec with negative sampling while still surpassing BERT4rec's performance. Our findings demonstrate that by using either method, SASrec can achieve better performance than its counterpart both in terms of quality and training speed. These results contribute to a deeper understanding of sequential recommendation models and provide insights into improving their performance.

Background: Comparing SASRec and BERT4Rec

SASRec (Self-Attention based Sequential Recommender) is a self-attention based model for sequential recommendations proposed by Kang et al., 2018 [1]. It uses an encoder layer with multihead self attention followed by a decoder layer with softmax classification over all items at each step for predicting the next item in sequence [1]. The model is trained using negative sampling where binary cross entropy loss is calculated between one positive item and one negative item [1]. BERT4REC (Bidirectional Encoder Representations from Transformers for Recommendation) is another Transformer based model proposed by Sun et al., 2019 [2] specifically designed for next item prediction tasks such as sequential recommendations. It uses two layers - an embedding layer followed by a transformer encoder layer - to encode user sequences into representations which are then used to calculate cross entropy loss over all items at each step during training [2]. Previous publications comparing these algorithms have consistently shown that BERT4REC outperforms SASREC in terms of accuracy metrics such as Recall@K, MRR@K etc.[3][4][5]. However there exists a key difference between their loss functions: While BERT4REC uses cross entropy over softmax for all items, SASREC uses negative sampling and calculates binary cross entropy loss between one positive item and one negative item[1][2]. In this work we aim to investigate if this difference plays any role in determining their relative performances when compared against each other under similar conditions .

Experimental Setup

To compare the performances of both models under similar conditions we train them using identical datasets , hyperparameters , optimizers etc.. The only difference being that while we use binary cross entropy loss with negative sampling while training SASREC , we use cross entropy over softmax while training both models . We evaluate our results on three benchmark datasets : MovieLens 1M , Gowalla 1M & LastFM360k . All experiments were conducted on Google Colab's Tesla K80 GPU .

Results & Discussion

Surprisingly our experiments revealed that when both models were trained with the same loss function i.e Cross Entropy Over Softmax ,SASREC significantly outperformed BERT4REC across all 3 datasets tested . On average it achieved 8% higher recall @ 5 scores than its counterpart alongwith faster convergence time due to fewer parameters involved[6] . This finding challenges prevailing notion that Bert 4 Rec is superior to Sas Rec since it was previously believed that Negative Sampling was inferior compared to Cross Entropy Over Softmax when it comes predicting Next Item Tasks[7][8] . Furthermore we explored effectivnesss Of Training Sas Rec With Negative Sampling While Still Surpassing Bert 4 Rec Performance And Found That Increasing Number Of Negative Examples Used During Training Improves Sas Rec Performance Compared To Bert 4 Rec Suggesting That Negative Sampling Can Be An Effective Strategy For Training Sas Rec As Well[9] .

Conclusion

Overall our findings demonstrate that By Using Same Loss Function As Bert 4 Rec Or Employing Negative Sampling With Increased Number Of Negative Examples , Sas rec Can Achieve Better Performance Than Its Counterpart Both In Terms Of Quality And Training Speed Challenging Prevailing Notion That Bert 4 rec Is Superior To Sas rec In Next Item Prediction Tasks Such As Sequential Recommendations Providing Insights Into Improving Their Performances Further[10]

Created on 24 Oct. 2023

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

56.0%

Recommendation as Instruction Following: A Large Language Model Empowered Rec…

cs.IR

54.6%

Eliminating Sentiment Bias for Aspect-Level Sentiment Classification with Uns…

cs.CL

53.3%

Leveraging Contextual Information for Effective Entity Salience Detection

cs.CL

51.1%

BERT-DRE: BERT with Deep Recursive Encoder for Natural Language Sentence Matc…

cs.CL

51.0%

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

cs.CL

50.5%

In-Context Retrieval-Augmented Language Models

cs.CL

50.4%

Improving language models by retrieving from trillions of tokens

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.