Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models

AI-generated keywords: Text Retrieval

AI-generated Key Points

  • In text retrieval, it is important to extract smaller segments for various use cases
  • Traditional chunking methods may result in a loss of contextual information from surrounding chunks
  • Late chunking method utilizes advanced open-source models to embed all tokens of a long text before applying chunking just before mean pooling
  • This approach ensures that chunk embeddings capture full contextual information, leading to superior results without additional training
  • Late chunking can be seamlessly integrated into any long-context text embedding model and does not require extra training
  • The code for this method is available on GitHub for reproducibility
  • Late chunking offers a promising solution to enhance text retrieval by preserving contextual information within chunk embeddings
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Michael Günther, Isabelle Mohr, Bo Wang, Han Xiao

4 pages, early draft
License: CC BY-NC-SA 4.0

Abstract: Many use cases require retrieving smaller portions of text, and dense vector-based retrieval systems often perform better with shorter text segments, as the semantics are less likely to be "over-compressed" in the embeddings. Consequently, practitioners often split text documents into smaller chunks and encode them separately. However, chunk embeddings created in this way can lose contextual information from surrounding chunks, resulting in suboptimal representations. In this paper, we introduce a novel method called "late chunking," which leverages long context embedding models to first embed all tokens of the long text, with chunking applied after the transformer model and just before mean pooling. The resulting chunk embeddings capture the full contextual information, leading to superior results across various retrieval tasks without the need for additional training. Moreover, our method is generic enough to be applied to any long-context embedding model.

Submitted to arXiv on 07 Sep. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2409.04701v1

, , , , In the realm of text retrieval, it is crucial to extract smaller segments of text for various use cases. This is where excel, as they perform better with shorter text segments due to less "over-compression" in the embeddings. To achieve this, practitioners often split text documents into smaller chunks and encode them separately. However, this can result in a loss of contextual information from surrounding chunks, leading to suboptimal representations. To address this issue, a novel approach known as has been introduced. This method utilizes advanced open-source models such as jina-embeddings-v2 to first embed all tokens of a long text before applying chunking just before mean pooling. By doing so, the resulting chunk embeddings capture the full contextual information, leading to superior results across various retrieval tasks without requiring additional training. Furthermore, this method is versatile and can be applied to any . The limitations of traditional chunking methods are highlighted through an illustration using a Wikipedia article on Berlin that is split into chunks. It becomes evident that phrases like "its" and "the city," which reference "Berlin," are mentioned only in the first sentence, making it challenging for the embedding model to link these references accurately. Late chunking overcomes these limitations by utilizing the capabilities of advanced open-source models such as jina-embeddings-v2 to encode all tokens of documents before applying chunking just before mean pooling. This approach ensures that chunk embeddings retain crucial contextual information from the entire text, leading to improved performance compared to conventional chunking methods across various retrieval benchmarks. Notably, late chunking does not require additional training and can be seamlessly integrated into any long-context text embedding model. To facilitate reproducibility, the code for this method has been made available on GitHub. The paper delves into related work in Section 2, explains the late chunking method in Section 3, presents an evaluation in Section 4 showcasing its superiority over traditional approaches, and concludes with insights in Section 5. In conclusion, offers a promising solution to enhance text retrieval by preserving contextual information within chunk embeddings derived from long texts. Its effectiveness and versatility make it a valuable addition to existing techniques for optimizing retrieval tasks across different domains and applications.
Created on 16 Jan. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.