OmniSearchSage: Multi-Task Multi-Entity Embeddings for Pinterest Search

AI-generated keywords: OmniSearchSage Pinterest search entity representations multitask learning embedding techniques

AI-generated Key Points

  • Introduction of a versatile and scalable system designed to enhance search queries, pins, and products for Pinterest
  • Achieved significant improvements in relevance (>8%), engagement (>7%), and ads click-through rate (CTR) (>5%) within Pinterest's production search system
  • Factors contributing to gains include enhanced content understanding, improved multi-task learning capabilities, and real-time serving functionalities
  • Incorporation of diverse text derived from image captions generated by a language model (LLM), historical user engagement data, and user-curated board titles to enrich the system
  • Production of a single search query embedding in the same space as pin and product embeddings through multitask learning setup
  • Conducted ablation studies to demonstrate the value of each feature, highlighting the effectiveness of the unified model compared to standalone counterparts
  • Deployment of embeddings across the Pinterest search stack for efficient retrieval and ranking processes while scaling to serve up to 300k requests per second at low latency
  • Significant contribution of synthetically generated descriptions towards enhancing the model's performance by providing additional context for pins lacking titles or descriptions
  • Improvement across all metrics including relevance and long-click rates by incorporating synthetic captions along with board titles and engaged queries as features in the model
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Prabhat Agarwal, Minhazul Islam Sk, Nikil Pancha, Kurchi Subhra Hazra, Jiajing Xu, Chuck Rosenberg

8 pages, 5 figures, to be published as an oral paper in TheWebConf Industry Track 2024
License: CC BY 4.0

Abstract: In this paper, we present OmniSearchSage, a versatile and scalable system for understanding search queries, pins, and products for Pinterest search. We jointly learn a unified query embedding coupled with pin and product embeddings, leading to an improvement of $>8\%$ relevance, $>7\%$ engagement, and $>5\%$ ads CTR in Pinterest's production search system. The main contributors to these gains are improved content understanding, better multi-task learning, and real-time serving. We enrich our entity representations using diverse text derived from image captions from a generative LLM, historical engagement, and user-curated boards. Our multitask learning setup produces a single search query embedding in the same space as pin and product embeddings and compatible with pre-existing pin and product embeddings. We show the value of each feature through ablation studies, and show the effectiveness of a unified model compared to standalone counterparts. Finally, we share how these embeddings have been deployed across the Pinterest search stack, from retrieval to ranking, scaling to serve $300k$ requests per second at low latency. Our implementation of this work is available at https://github.com/pinterest/atg-research/tree/main/omnisearchsage.

Submitted to arXiv on 25 Apr. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2404.16260v1

In this paper, the authors introduce , a versatile and scalable system designed to enhance search queries, pins, and products for . By jointly learning a unified query embedding alongside pin and product embeddings, the system achieves significant improvements in relevance (>8%), engagement (>7%), and ads click-through rate (CTR) (>5%) within Pinterest's production search system. The key factors contributing to these gains include enhanced content understanding, improved multi-task learning capabilities, and real-time serving functionalities. To enrich , the authors incorporate diverse text derived from image captions generated by a language model (LLM), historical user engagement data, and user-curated board titles. Through a multitask learning setup, a single search query embedding is produced in the same space as pin and product embeddings, ensuring compatibility with pre-existing embeddings. Ablation studies are conducted to demonstrate the value of each feature, highlighting the effectiveness of the unified model compared to standalone counterparts. The authors also share insights on how these embeddings have been deployed across the Pinterest search stack, enabling efficient retrieval and ranking processes while scaling to serve up to 300k requests per second at low latency. Additionally, they provide access to their implementation on GitHub for further exploration. Further analysis reveals that synthetically generated descriptions significantly contribute to enhancing the model's performance by providing additional context for pins lacking titles or descriptions. By incorporating synthetic captions along with board titles and engaged queries as features in the model, there is a noticeable improvement across all metrics including relevance and long-click rates. A detailed comparison in Table 4 illustrates the impact of adding different text enrichments on the model's performance relative to continuous features only. The results show consistent enhancements with each additional feature added to the baseline model. Overall, presents a comprehensive approach towards improving search quality on Pinterest through advanced embedding techniques and leveraging diverse data sources for enhanced content understanding.
Created on 12 May. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.