Survey of vector database management systems

I tiakina i:
Ngā taipitopito rārangi puna kōrero
I whakaputaina i:The VLDB Journal vol. 33, no. 5 (Sep 2024), p. 1591
Kaituhi matua: Pan, James Jie
Ētahi atu kaituhi: Wang, Jianguo, Li, Guoliang
I whakaputaina:
Springer Nature B.V.
Ngā marau:
Urunga tuihono:Citation/Abstract
Full Text
Full Text - PDF
Ngā Tūtohu: Tāpirihia he Tūtohu
Kāore He Tūtohu, Me noho koe te mea tuatahi ki te tūtohu i tēnei pūkete!

MARC

LEADER 00000nab a2200000uu 4500
001 3256783920
003 UK-CbPIL
022 |a 1066-8888 
022 |a 0949-877X 
024 7 |a 10.1007/s00778-024-00864-x  |2 doi 
035 |a 3256783920 
045 2 |b d20240901  |b d20240930 
100 1 |a Pan, James Jie  |u Tsinghua University, Department of Computer Science and Technology, Beijing, China (GRID:grid.12527.33) (ISNI:0000 0001 0662 3178) 
245 1 |a Survey of vector database management systems 
260 |b Springer Nature B.V.  |c Sep 2024 
513 |a Journal Article 
520 3 |a There are now over 20 commercial vector database management systems (VDBMSs), all produced within the past five years. But embedding-based retrieval has been studied for over ten years, and similarity search a staggering half century and more. Driving this shift from algorithms to systems are new data intensive applications, notably large language models, that demand vast stores of unstructured data coupled with reliable, secure, fast, and scalable query processing capability. A variety of new data management techniques now exist for addressing these needs, however there is no comprehensive survey to thoroughly review these techniques and systems. We start by identifying five main obstacles to vector data management, namely the ambiguity of semantic similarity, large size of vectors, high cost of similarity comparison, lack of structural properties that can be used for indexing, and difficulty of efficiently answering “hybrid” queries that jointly search both attributes and vectors. Overcoming these obstacles has led to new approaches to query processing, storage and indexing, and query optimization and execution. For query processing, a variety of similarity scores and query types are now well understood; for storage and indexing, techniques include vector compression, namely quantization, and partitioning techniques based on randomization, learned partitioning, and “navigable” partitioning; for query optimization and execution, we describe new operators for hybrid queries, as well as techniques for plan enumeration, plan selection, distributed query processing, data manipulation queries, and hardware accelerated query execution. These techniques lead to a variety of VDBMSs across a spectrum of design and runtime characteristics, including “native” systems that are specialized for vectors and “extended” systems that incorporate vector capabilities into existing systems. We then discuss benchmarks, and finally outline research challenges and point the direction for future work. 
653 |a Data base management systems 
653 |a Data management 
653 |a Similarity 
653 |a Large language models 
653 |a Optimization 
653 |a Design 
653 |a Indexing 
653 |a Unstructured data 
653 |a Enumeration 
653 |a Electronic commerce 
653 |a Queries 
653 |a Fault tolerance 
653 |a Query processing 
653 |a Partitioning 
653 |a Barriers 
653 |a Chatbots 
653 |a Semantics 
700 1 |a Wang, Jianguo  |u Purdue University, Department of Computer Science, West Lafayette, USA (GRID:grid.169077.e) (ISNI:0000 0004 1937 2197) 
700 1 |a Li, Guoliang  |u Tsinghua University, Department of Computer Science and Technology, Beijing, China (GRID:grid.12527.33) (ISNI:0000 0001 0662 3178) 
773 0 |t The VLDB Journal  |g vol. 33, no. 5 (Sep 2024), p. 1591 
786 0 |d ProQuest  |t Advanced Technologies & Aerospace Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3256783920/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full Text  |u https://www.proquest.com/docview/3256783920/fulltext/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3256783920/fulltextPDF/embedded/6A8EOT78XXH2IG52?source=fedsrch