A Comprehensive Survey on Vector Database: Storage and Retrieval Technique, Challenge

AI-generated keywords: Vector databases high-dimensional data approximate nearest neighbor search algorithms challenges

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Vector databases are designed to store high-dimensional data that traditional database management systems struggle to handle.
  • The focus on the approximate nearest neighbor search (ANNS) problem has been a longstanding area of study with numerous algorithmic articles available in the literature.
  • The authors categorize studies based on their approach to solving the ANNS problem, including hash-based, tree-based, graph-based, and quantization-based methods.
  • Organizing algorithms within a framework helps readers understand diverse strategies for addressing challenges in high-dimensional data storage and retrieval.
  • The article highlights existing challenges faced by vector databases and explores potential solutions to enhance their performance.
  • One intriguing aspect discussed is integrating vector databases with large language models for new possibilities in data processing and analysis.
  • This survey serves as a valuable resource for researchers and practitioners seeking insights into cutting-edge techniques for managing high-dimensional data effectively.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yikun Han, Chunjiang Liu, Pengfei Wang

Abstract: A vector database is used to store high-dimensional data that cannot be characterized by traditional DBMS. Although there are not many articles describing existing or introducing new vector database architectures, the approximate nearest neighbor search problem behind vector databases has been studied for a long time, and considerable related algorithmic articles can be found in the literature. This article attempts to comprehensively review relevant algorithms to provide a general understanding of this booming research area. The basis of our framework categorises these studies by the approach of solving ANNS problem, respectively hash-based, tree-based, graph-based and quantization-based approaches. Then we present an overview of existing challenges for vector databases. Lastly, we sketch how vector databases can be combined with large language models and provide new possibilities.

Submitted to arXiv on 18 Oct. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2310.11703v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

The article "A Comprehensive Survey on Vector Database: Storage and Retrieval Technique, Challenge" by Yikun Han, Chunjiang Liu, and Pengfei Wang delves into the realm of vector databases. These databases are designed to store high-dimensional data that traditional database management systems struggle to handle. Despite the scarcity of literature discussing existing or innovative vector database architectures, the focus on the approximate nearest neighbor search (ANNS) problem has been a longstanding area of study with numerous algorithmic articles available in the literature. The authors aim to provide a thorough review of relevant algorithms in this burgeoning research field. They categorize studies based on their approach to solving the ANNS problem. These approaches include hash-based, tree-based, graph-based, and quantization-based methods. By organizing these algorithms within a framework, readers can gain a better understanding of the diverse strategies employed in addressing the challenges posed by high-dimensional data storage and retrieval. Furthermore, the article highlights the existing challenges faced by vector databases and explores potential solutions to enhance their performance. One intriguing aspect discussed is how vector databases can be integrated with large language models to unlock new possibilities for data processing and analysis. Overall, this comprehensive survey serves as a valuable resource for researchers and practitioners seeking insights into cutting-edge techniques for managing high-dimensional data effectively. It sheds light on the evolving landscape of vector databases and offers a roadmap for future advancements in this dynamic field.
Created on 30 Aug. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.