VectorDB to buy or not to buy
If you work in data science, chances are that you had more than 5 talks from vendors about vector DB about how fast they are, how scalable they are, and how easy they are
The Key question is
Which Vector DB should you buy?
Realistically 95% of the Data scientists had never used a VectorDB 1 year ago. Vector DBs have nothing new. They have been used for specialized applications for storing vectors. Such examples included recommendation engines, customer behavior can be represented as vectors and typically can leverage nearest neighbors and other vector-based calculations. Another use was embedding representation when features (text or not) are represented as vectors.
Even then very few companies had datasets justifying investing into specialized databases unless the dataset was massive and required fast computation.
Fast forward 2023, and every enterprise wants one…
Key features of vector DB are only 3. Many vendors added features on top of this.
- Ability to store a large number of vector objects
- Ability to connect to embedding model so that when you push data in the vector store, it will get converted to vector and stored as text too (optional)
- Ability to retrieve vector based on some distance to some query or other.
First, we can store objects as vector
Second, we can retrieve the closest vectors to do a search
Most databases are quite similar on these features. Most have the same search algorithms (Approximate nearest neighbor). They differ in speed of indexing and speed of retrieval. The rest is unlikely to be a differentiator.
So far, this sounds quite useless: you can store and retrieve vectors, therefore most applications were cantoned to 2 areas: Search engines to match the query to relevant documents super…