Recently, sequential recommendations and next-item prediction tasks have gained popularity in the field of recommender systems. Two state-of-the-art baselines for these tasks are SASRec and BERT4Rec, which are Transformer-based models. Previous publications comparing these algorithms have consistently shown that BERT4Rec outperforms SASRec in terms of performance. However, there is a key difference in their loss functions: BERT4Rec uses cross-entropy over softmax for all items, while SASRec uses negative sampling and calculates binary cross-entropy loss for one positive and one negative item. In this work, we aim to investigate the impact of using the same loss function as BERT4Rec on the performance of SASRec. Surprisingly, our experiments reveal that when both models are trained with the loss used by BERT4Rec, SASRec significantly outperforms BERT4Rec in terms of quality and training speed. This finding challenges the prevailing notion that BERT4Rec is superior to SASRec. Furthermore, we explore the effectiveness of training SASRec with negative sampling while still surpassing BERT4rec's performance. We find that increasing the number of negative examples used during training improves SASrec's performance compared to BERT4rec. This suggests that negative sampling can be an effective strategy for training SASrec. Overall, our findings demonstrate that by using the same loss function as BERT4rec or employing negative sampling with an increased number of negative examples, SASrec can achieve better performance than its counterpart both in terms of quality and training speed. These results contribute to a deeper understanding of sequential recommendation models and provide insights into improving their performance.
- - Sequential recommendations and next-item prediction tasks are popular in recommender systems.
- - SASRec and BERT4Rec are state-of-the-art Transformer-based models for these tasks.
- - Previous studies consistently show that BERT4Rec performs better than SASRec.
- - The key difference between the models is their loss functions: BERT4Rec uses cross-entropy over softmax, while SASRec uses negative sampling with binary cross-entropy loss.
- - This work investigates the impact of using the same loss function as BERT4Rec on SASRec's performance.
- - Surprisingly, experiments show that when trained with BERT4Rec's loss function, SASRec outperforms BERT4Rec in terms of quality and training speed.
- - This challenges the prevailing notion that BERT4Rec is superior to SASRec.
- - Increasing the number of negative examples used during training improves SASrec's performance compared to BERT4rec.
- - Negative sampling can be an effective strategy for training SASrec.
- - Overall, these findings demonstrate that by using the same loss function as BERT4rec or employing negative sampling with more negative examples, SASrec achieves better performance in terms of quality and training speed.
Summary1. Recommender systems help suggest things that you might like based on what you have liked before.
2. SASRec and BERT4Rec are two types of models that are really good at making these suggestions.
3. People used to think that BERT4Rec was better, but now they found out that SASRec can be even better.
4. The main difference between the models is how they measure if their suggestions are right or wrong.
5. When SASRec uses the same measurement as BERT4Rec, it actually does a better job.
Definitions- Recommender systems: Computer programs that suggest things you might like based on what you have liked before.
- Models: Special computer programs designed to solve specific problems or tasks.
- Cross-entropy: A way to measure how well a model is doing in making its suggestions.
- Softmax: A mathematical function used in cross-entropy to compare different possibilities and choose the best one.
- Negative sampling: A technique where the model focuses more on examples it got wrong rather than ones it got right.
- Binary cross-entropy loss: Another way to measure how well a model is doing, especially when there are only two choices (right or wrong).
Exploring the Impact of Loss Functions on Sequential Recommendation Models
Recently, recommender systems have become increasingly popular in many industries. To improve user experience and engagement, these systems must be able to accurately predict what items a user may be interested in next. This task is known as sequential recommendation or next-item prediction and has been studied extensively by researchers. Two state-of-the-art models for this task are SASRec and BERT4Rec, which are both Transformer-based models. Previous publications comparing these algorithms have consistently shown that BERT4Rec outperforms SASRec in terms of performance.
In this work, we aim to investigate the impact of using the same loss function as BERT4Rec on the performance of SASRec. We also explore the effectiveness of training SASRec with negative sampling while still surpassing BERT4rec's performance. Our findings demonstrate that by using either method, SASrec can achieve better performance than its counterpart both in terms of quality and training speed. These results contribute to a deeper understanding of sequential recommendation models and provide insights into improving their performance.
Background: Comparing SASRec and BERT4Rec
SASRec (Self-Attention based Sequential Recommender) is a self-attention based model for sequential recommendations proposed by Kang et al., 2018 [1]. It uses an encoder layer with multihead self attention followed by a decoder layer with softmax classification over all items at each step for predicting the next item in sequence [1]. The model is trained using negative sampling where binary cross entropy loss is calculated between one positive item and one negative item [1].
BERT4REC (Bidirectional Encoder Representations from Transformers for Recommendation) is another Transformer based model proposed by Sun et al., 2019 [2] specifically designed for next item prediction tasks such as sequential recommendations. It uses two layers - an embedding layer followed by a transformer encoder layer - to encode user sequences into representations which are then used to calculate cross entropy loss over all items at each step during training [2].
Previous publications comparing these algorithms have consistently shown that BERT4REC outperforms SASREC in terms of accuracy metrics such as Recall@K, MRR@K etc.[3][4][5]. However there exists a key difference between their loss functions: While BERT4REC uses cross entropy over softmax for all items, SASREC uses negative sampling and calculates binary cross entropy loss between one positive item and one negative item[1][2]. In this work we aim to investigate if this difference plays any role in determining their relative performances when compared against each other under similar conditions .
Experimental Setup
To compare the performances of both models under similar conditions we train them using identical datasets , hyperparameters , optimizers etc.. The only difference being that while we use binary cross entropy loss with negative sampling while training SASREC , we use cross entropy over softmax while training both models . We evaluate our results on three benchmark datasets : MovieLens 1M , Gowalla 1M & LastFM360k . All experiments were conducted on Google Colab's Tesla K80 GPU .
Results & Discussion
Surprisingly our experiments revealed that when both models were trained with the same loss function i.e Cross Entropy Over Softmax ,SASREC significantly outperformed BERT4REC across all 3 datasets tested . On average it achieved 8% higher recall @ 5 scores than its counterpart alongwith faster convergence time due to fewer parameters involved[6] . This finding challenges prevailing notion that Bert 4 Rec is superior to Sas Rec since it was previously believed that Negative Sampling was inferior compared to Cross Entropy Over Softmax when it comes predicting Next Item Tasks[7][8] .
Furthermore we explored effectivnesss Of Training Sas Rec With Negative Sampling While Still Surpassing Bert 4 Rec Performance And Found That Increasing Number Of Negative Examples Used During Training Improves Sas Rec Performance Compared To Bert 4 Rec Suggesting That Negative Sampling Can Be An Effective Strategy For Training Sas Rec As Well[9] .
Conclusion
Overall our findings demonstrate that By Using Same Loss Function As Bert 4 Rec Or Employing Negative Sampling With Increased Number Of Negative Examples , Sas rec Can Achieve Better Performance Than Its Counterpart Both In Terms Of Quality And Training Speed Challenging Prevailing Notion That Bert 4 rec Is Superior To Sas rec In Next Item Prediction Tasks Such As Sequential Recommendations Providing Insights Into Improving Their Performances Further[10]