24–09–2023: VectorDB

4 min readSep 24

VectorDB to buy or not to buy

If you work in data science, chances are that you had more than 5 talks from vendors about vector DB about how fast they are, how scalable they are, and how easy they are

The Key question is

Which Vector DB should you buy?

Some History

Realistically 95% of the Data scientists had never used a VectorDB 1 year ago. Vector DBs have nothing new. They have been used for specialized applications for storing vectors. Such examples included recommendation engines, customer behavior can be represented as vectors and typically can leverage nearest neighbors and other vector-based calculations. Another use was embedding representation when features (text or not) are represented as vectors.

Even then very few companies had datasets justifying investing into specialized databases unless the dataset was massive and required fast computation.

Fast forward 2023, and every enterprise wants one…

Technical aspects

Key Features:

Key features of vector DB are only 3. Many vendors added features on top of this.

  • Ability to store a large number of vector objects
  • Ability to connect to embedding model so that when you push data in the vector store, it will get converted to vector and stored as text too (optional)
  • Ability to retrieve vector based on some distance to some query or other.

First, we can store objects as vector

Second, we can retrieve the closest vectors to do a search

Most databases are quite similar on these features. Most have the same search algorithms (Approximate nearest neighbor). They differ in speed of indexing and speed of retrieval. The rest is unlikely to be a differentiator.

So far, this sounds quite useless: you can store and retrieve vectors, therefore most applications were cantoned to 2 areas: Search engines to match the query to relevant documents super…


Strategy/Data/Leadership head of DS at OCBC ~~ exTwitter ~~ ex-gojek