The world of data is constantly evolving, and sometimes, traditional relational databases struggle to keep pace. Vector data types and vector databases have emerged as a powerful solution for handling high-dimensional data, in applications like recommendation systems, image recognition, and natural language processing and etc. They allow you to store and manipulate multidimensional data points, offering a powerful way to represent complex relationships and perform similarity searches. I’ll try to cover how to use vector data types in PostgreSQL using pgvector
extension and how Drizzle ORM can offer a very smooth Typescript experience.
Imagine data points not just as rows and columns, but as positions in a vast space. Vectors are essentially arrays of numbers that can represent data points in multi-dimensional space. These vectors are mathematical representations of complex data, such as text, images, or audio, in a high-dimensional space.
Vector databases, therefore, specialize in storing, indexing, and querying vector data. They use distance measures (like Euclidean distance or cosine similarity) to find similarities between vectors, enabling fast and efficient retrieval of similar items from a large dataset. This capability is crucial for implementing features like search-by-image, recommendations, or any application requiring similarity searches at scale.
Common use cases for vector databases include:
While Postgres itself doesn't natively support vector data types, the pgvector extension bridges the gap. It adds the vector
data type, allowing you to store multidimensional data points directly in your Postgres tables.
Drizzle ORM is a tool for TypeScript, designed to make it easier for developers to interact with databases. It provides a type-safe way to query and manipulate data in SQL databases, leveraging TypeScript's advanced type system for more reliable and maintainable code. I love how good their API is.
There’s an extensive guide on how to enable install pgvector and use it. The easiest way is to use their Docker image and run this command to enable the extension:
Declaring schemas with Drizzle and connecting to the database is as easy as:
Since we’re talking about two of the latest tools, they don’t natively go together but there are 2 ways to make it work.
pgvector team offers a list of solutions for different ORMs of Node.js (one being Drizzle ORM). Here’s an example of using their tool with Drizzle ORM:
They also provide maxInnerProduct
and cosineDistnace
functions for different ways of finding distance between vectors
Drizzle ORM lets to define custom types for special use cases that you may have. Here I tried to create a custom type for Vectors: