OmniSearchSage: Multi-Task Multi-Entity Embeddings for Pinterest Search

AI-generated keywords: OmniSearchSage Pinterest search entity representations multitask learning embedding techniques

AI-generated Key Points

Introduction of a versatile and scalable system designed to enhance search queries, pins, and products for Pinterest
Achieved significant improvements in relevance (>8%), engagement (>7%), and ads click-through rate (CTR) (>5%) within Pinterest's production search system
Factors contributing to gains include enhanced content understanding, improved multi-task learning capabilities, and real-time serving functionalities
Incorporation of diverse text derived from image captions generated by a language model (LLM), historical user engagement data, and user-curated board titles to enrich the system
Production of a single search query embedding in the same space as pin and product embeddings through multitask learning setup
Conducted ablation studies to demonstrate the value of each feature, highlighting the effectiveness of the unified model compared to standalone counterparts
Deployment of embeddings across the Pinterest search stack for efficient retrieval and ranking processes while scaling to serve up to 300k requests per second at low latency
Significant contribution of synthetically generated descriptions towards enhancing the model's performance by providing additional context for pins lacking titles or descriptions
Improvement across all metrics including relevance and long-click rates by incorporating synthetic captions along with board titles and engaged queries as features in the model

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Prabhat Agarwal, Minhazul Islam Sk, Nikil Pancha, Kurchi Subhra Hazra, Jiajing Xu, Chuck Rosenberg

arXiv: 2404.16260v1 - DOI (cs.IR)

8 pages, 5 figures, to be published as an oral paper in TheWebConf Industry Track 2024

License: CC BY 4.0

Abstract: In this paper, we present OmniSearchSage, a versatile and scalable system for understanding search queries, pins, and products for Pinterest search. We jointly learn a unified query embedding coupled with pin and product embeddings, leading to an improvement of $>8\%$ relevance, $>7\%$ engagement, and $>5\%$ ads CTR in Pinterest's production search system. The main contributors to these gains are improved content understanding, better multi-task learning, and real-time serving. We enrich our entity representations using diverse text derived from image captions from a generative LLM, historical engagement, and user-curated boards. Our multitask learning setup produces a single search query embedding in the same space as pin and product embeddings and compatible with pre-existing pin and product embeddings. We show the value of each feature through ablation studies, and show the effectiveness of a unified model compared to standalone counterparts. Finally, we share how these embeddings have been deployed across the Pinterest search stack, from retrieval to ranking, scaling to serve $300k$ requests per second at low latency. Our implementation of this work is available at https://github.com/pinterest/atg-research/tree/main/omnisearchsage.

Submitted to arXiv on 25 Apr. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2404.16260v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

In this paper, the authors introduce , a versatile and scalable system designed to enhance search queries, pins, and products for . By jointly learning a unified query embedding alongside pin and product embeddings, the system achieves significant improvements in relevance (>8%), engagement (>7%), and ads click-through rate (CTR) (>5%) within Pinterest's production search system. The key factors contributing to these gains include enhanced content understanding, improved multi-task learning capabilities, and real-time serving functionalities. To enrich , the authors incorporate diverse text derived from image captions generated by a language model (LLM), historical user engagement data, and user-curated board titles. Through a multitask learning setup, a single search query embedding is produced in the same space as pin and product embeddings, ensuring compatibility with pre-existing embeddings. Ablation studies are conducted to demonstrate the value of each feature, highlighting the effectiveness of the unified model compared to standalone counterparts. The authors also share insights on how these embeddings have been deployed across the Pinterest search stack, enabling efficient retrieval and ranking processes while scaling to serve up to 300k requests per second at low latency. Additionally, they provide access to their implementation on GitHub for further exploration. Further analysis reveals that synthetically generated descriptions significantly contribute to enhancing the model's performance by providing additional context for pins lacking titles or descriptions. By incorporating synthetic captions along with board titles and engaged queries as features in the model, there is a noticeable improvement across all metrics including relevance and long-click rates. A detailed comparison in Table 4 illustrates the impact of adding different text enrichments on the model's performance relative to continuous features only. The results show consistent enhancements with each additional feature added to the baseline model. Overall, presents a comprehensive approach towards improving search quality on Pinterest through advanced embedding techniques and leveraging diverse data sources for enhanced content understanding.

- Introduction of a versatile and scalable system designed to enhance search queries, pins, and products for Pinterest
- Achieved significant improvements in relevance (>8%), engagement (>7%), and ads click-through rate (CTR) (>5%) within Pinterest's production search system
- Factors contributing to gains include enhanced content understanding, improved multi-task learning capabilities, and real-time serving functionalities
- Incorporation of diverse text derived from image captions generated by a language model (LLM), historical user engagement data, and user-curated board titles to enrich the system
- Production of a single search query embedding in the same space as pin and product embeddings through multitask learning setup
- Conducted ablation studies to demonstrate the value of each feature, highlighting the effectiveness of the unified model compared to standalone counterparts
- Deployment of embeddings across the Pinterest search stack for efficient retrieval and ranking processes while scaling to serve up to 300k requests per second at low latency
- Significant contribution of synthetically generated descriptions towards enhancing the model's performance by providing additional context for pins lacking titles or descriptions
- Improvement across all metrics including relevance and long-click rates by incorporating synthetic captions along with board titles and engaged queries as features in the model

SummaryPinterest introduced a new system to make searching for things easier. They made search results more relevant and engaging, leading to more people clicking on ads. This was achieved by improving how the system understands content and learns from different tasks in real-time. They used text from image captions, user data, and board titles to make the system better. The system now combines search queries with pins and products for better results. Definitions- Versatile: Able to adapt or be used in various ways. - Scalable: Capable of growing or expanding easily. - Relevance: How closely something matches what you are looking for. - Engagement: How much people interact with something. - Ads click-through rate (CTR): The percentage of people who click on an advertisement after seeing it. - Multi-task learning: Learning multiple things at the same time. - Embeddings: Representations of data in a lower-dimensional space. - Ablation studies: Experiments that remove certain features to see their impact on the overall model performance. - Latency: The time it takes for a system to respond to a request. - Synthetic: Artificially created or generated.

Introduction

Pinterest is a popular visual discovery platform that allows users to search for and save ideas, inspiration, and products. With over 400 million monthly active users, Pinterest has become a go-to destination for people looking to find inspiration or discover new products. As the platform continues to grow, it becomes increasingly important to provide relevant and engaging search results for its users. In order to improve the search experience on Pinterest, a team of researchers from the company has developed a versatile and scalable system called , which aims to enhance search queries, pins, and products within the platform. In this blog article, we will dive into the details of this research paper and explore how is able to achieve significant improvements in relevance, engagement, and ads click-through rate (CTR) within Pinterest's production search system.

The Problem

The primary goal of any search engine is to provide relevant results that meet the user's intent. However, with millions of pins and products on Pinterest's platform, ensuring that every user gets personalized and accurate results can be challenging. This is where comes in – by jointly learning a unified query embedding alongside pin and product embeddings. Traditionally, most systems use separate embeddings for different types of data such as text-based queries or image-based pins. This approach can lead to inconsistencies in understanding between different types of data. For example,, an image may have multiple captions associated with it while a text-based query may not have any images attached at all.

The Solution

To address these issues,, uses advanced embedding techniques that allow it to understand both textual queries as well as visual content simultaneously. By incorporating diverse text derived from image captions generated by a language model (LLM), historical user engagement data,,,,and user-curated board titles,, is able to gain a better understanding of the content on its platform. The key factor that contributes to the success of is its ability to jointly learn a unified query embedding alongside pin and product embeddings. This ensures compatibility between different types of data and leads to improved multi-task learning capabilities, resulting in significant improvements in relevance, engagement, and ads CTR within Pinterest's production search system.

The Implementation

To enrich , the authors incorporate diverse text derived from image captions generated by a language model (LLM), historical user engagement data,,and user-curated board titles. These features are then used in a multitask learning setup where a single search query embedding is produced in the same space as pin and product embeddings. This allows for efficient retrieval and ranking processes while scaling to serve up to 300k requests per second at low latency. Ablation studies were conducted by the authors to demonstrate the value of each feature used in . The results showed that each feature contributed significantly towards improving the performance of the model, with synthetic descriptions having the most impact. By incorporating synthetic captions along with board titles and engaged queries as features in the model, there was a noticeable improvement across all metrics including relevance and long-click rates.

A Comparison

In Table 4 of their research paper, the authors provide a detailed comparison between adding different text enrichments on top of continuous features only. The results clearly show consistent enhancements with each additional feature added to the baseline model. This highlights how takes into account various aspects such as textual context, user behavior, and curated content titles for enhanced content understanding.

Conclusion

In conclusion, presents a comprehensive approach towards improving search quality on Pinterest through advanced embedding techniques and leveraging diverse data sources for enhanced content understanding. By jointly learning a unified query embedding alongside pin and product embeddings, has achieved significant improvements in relevance, engagement, and ads CTR within Pinterest's production search system. The authors have also shared their implementation of on GitHub for further exploration. This not only allows for transparency but also encourages collaboration and innovation in the field of search engine technology. As Pinterest continues to grow and evolve, will play a crucial role in ensuring that users are provided with personalized and relevant results. With its versatile and scalable design, is set to revolutionize the way we search for ideas and products on Pinterest.

Created on 12 May. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Similar papers summarized with our AI tools

59.1%

Pfeed: Generating near real-time personalized feeds using precomputed embeddi…

cs.IR

56.0%

Learning Personalized Page Content Ranking Using Customer Representation

cs.IR

54.5%

Pre-training Tasks for User Intent Detection and Embedding Retrieval in E-com…

cs.IR

53.6%

Recent advances in text embedding: A Comprehensive Review of Top-Performing M…

cs.IR

52.0%

EnterpriseEM: Fine-tuned Embeddings for Enterprise Semantic Search

cs.IR

51.1%

Dynamic Q&A of Clinical Documents with Large Language Models

cs.IR

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.