In their paper titled "An Embedding-Based Grocery Search Model at Instacart," authors Yuqing Xie, Taesik Na, Xiao Xiao, Saurav Manchanda, Young Rao, Zhihong Xu, Guanghua Shu, Esther Vasiete, Tejaswi Tenneti, and Haixun Wang address the challenge of optimizing e-commerce search by leveraging large yet noisy log data. The study introduces an embedding-based model specifically designed for grocery search on the Instacart platform. This system employs a two-tower transformer-based encoder architecture to learn representations of both user queries and product information. To overcome the cold-start problem commonly encountered in e-commerce search engines, the focus is placed on content-based features. To effectively train the model on noisy data, the researchers propose a self-adversarial learning method along with a cascade training approach. Through rigorous testing on an offline human evaluation dataset, they report a significant 10% relative improvement in RECALL@20 metrics. Furthermore, during online A/B testing scenarios, the model demonstrates notable enhancements with a 4.1% increase in cart-adds per search (CAPS) and a 1.5% boost in gross merchandise value (GMV). The authors delve into the details of how they trained and deployed this embedding-based search model while providing an insightful analysis of its effectiveness. Their findings shed light on the potential of utilizing advanced techniques to enhance e-commerce search functionalities and improve user experience within online grocery platforms like Instacart.
- - Authors address the challenge of optimizing e-commerce search by leveraging large yet noisy log data
- - Introduce an embedding-based model for grocery search on Instacart platform
- - System employs a two-tower transformer-based encoder architecture to learn representations of user queries and product information
- - Focus on content-based features to overcome cold-start problem in e-commerce search engines
- - Propose self-adversarial learning method and cascade training approach to effectively train model on noisy data
- - Report significant 10% relative improvement in RECALL@20 metrics through rigorous testing on offline human evaluation dataset
- - Model demonstrates notable enhancements in online A/B testing scenarios: 4.1% increase in cart-adds per search (CAPS) and 1.5% boost in gross merchandise value (GMV)
- - Authors provide detailed insights into training and deployment of embedding-based search model, highlighting its effectiveness
Summary- The authors worked on making online shopping searches better by using a lot of data, even if it's not perfect.
- They made a new way to search for groceries on Instacart.
- Their system uses a special structure to understand what people are looking for and details about products.
- They focused on using the content of items to help new products show up in searches.
- They came up with ways to teach the model well even with noisy data and saw good results when testing it.
Definitions- E-commerce: Buying and selling things online
- Embedding-based model: A way of representing information in a structured form
- Transformer-based encoder architecture: A system that helps understand and process data
- Cold-start problem: Difficulty in recommending or showing new items to users
- Self-adversarial learning method: A technique where the model learns from its own mistakes
- Cascade training approach: A method of training models step by step
- Recall@20 metrics: A measure of how many relevant items are shown in the top 20 search results
- Cart-adds per search (CAPS): How many times an item is added to the cart after being searched for
- Gross merchandise value (GMV): Total sales value generated by an e-commerce platform
Introduction
In today's fast-paced world, online shopping has become an integral part of our daily lives. With the rise of e-commerce platforms, consumers now have access to a vast array of products at their fingertips. However, with such a large selection comes the challenge of finding exactly what we are looking for quickly and efficiently. This is especially true in the case of grocery shopping, where users often have specific items in mind and need to find them among thousands of options.
To address this issue, researchers from Instacart - one of the leading online grocery platforms - have developed a new search model that leverages advanced techniques to optimize e-commerce search functionalities. In their paper titled "An Embedding-Based Grocery Search Model at Instacart," authors Yuqing Xie, Taesik Na, Xiao Xiao, Saurav Manchanda, Young Rao, Zhihong Xu, Guanghua Shu, Esther Vasiete, Tejaswi Tenneti and Haixun Wang present their findings on how they were able to significantly improve user experience through this innovative approach.
The Challenge: Optimizing E-Commerce Search
The primary goal of any e-commerce platform is to provide its users with an efficient and seamless shopping experience. However, achieving this can be challenging due to several factors. One significant obstacle is dealing with large yet noisy log data generated by user interactions on the platform.
In traditional search engines like Google or Bing, queries are typically short and straightforward. However, in e-commerce platforms like Instacart where users are searching for specific products among thousands of options with varying attributes (e.g., brand name or size), queries tend to be more complex and diverse. As a result, it becomes challenging for traditional methods to accurately match these queries with relevant products.
Another significant challenge faced by e-commerce search engines is the cold-start problem - when there is limited or no information available about a new product. In such cases, traditional methods that rely on user behavior and historical data struggle to provide accurate results.
The Solution: An Embedding-Based Model
To overcome these challenges, the researchers at Instacart developed an embedding-based model specifically designed for grocery search. This system employs a two-tower transformer-based encoder architecture to learn representations of both user queries and product information.
The two-tower architecture consists of two separate neural networks - one for processing query data and the other for product data. These networks are trained simultaneously using large amounts of noisy log data from Instacart's platform. The use of transformer-based encoders allows the model to capture long-term dependencies in the data, making it more effective in handling complex queries.
Moreover, instead of relying solely on user behavior and historical data, this model focuses on content-based features to address the cold-start problem. By considering attributes like brand name, size, or category, the model can accurately match new products with relevant queries without any prior information.
Training and Deployment
To effectively train their embedding-based search model on noisy data, the researchers propose a self-adversarial learning method along with a cascade training approach. Self-adversarial learning involves adding noise to input features during training to make the model more robust against noise in real-world scenarios. The cascade training approach is used to gradually improve performance by fine-tuning different parts of the network separately.
After rigorous testing on an offline human evaluation dataset consisting of over 10 million query-product pairs from Instacart's platform, the researchers report a significant 10% relative improvement in RECALL@20 metrics compared to traditional methods.
The final step was deploying this embedding-based search model into production at Instacart's platform. To ensure minimal disruption to users' shopping experience during deployment, A/B testing was conducted where half of the users were shown results from the new model, and the other half continued to see results from the traditional method. The results were impressive, with a 4.1% increase in cart-adds per search (CAPS) and a 1.5% boost in gross merchandise value (GMV) for users who saw results from the embedding-based model.
Conclusion
In conclusion, this research paper by Instacart's team showcases how advanced techniques like transformer-based encoders and self-adversarial learning can significantly enhance e-commerce search functionalities. By developing an embedding-based model specifically designed for grocery search, they were able to overcome challenges like noisy data and cold-start problems effectively.
The findings of this study shed light on the potential of utilizing advanced techniques to improve user experience within online grocery platforms like Instacart. With further developments and advancements in technology, we can expect to see more innovative approaches being adopted by e-commerce platforms to optimize their search engines and provide users with a seamless shopping experience.