Storing Vector in Postgres with Drizzle ORM

March 21, 2024 · Updated on April 09, 2025

EDIT: Drizzle ORM now supports vectors out of the box, making implementation much easier.

You can now simply import the vector type from Drizzle and you’re good to go:

import { pgTable, vector } from 'drizzle-orm/pg-core';
 
const Products = pgTable('products', {
  embedding: vector('embedding', { dimensions: 1024 }),
});

Adding an index to your vector column is just as straightforward:

export const Products = pgTable(
  'products',
  {
    embedding: vector('embedding', { dimensions: 1024 }),
  },
  (t) => [
    index('products_embedding_index').using(
      'hnsw',
      t.embedding.op('vector_ip_ops'),
    ),
  ],
);

In this example, I’m using inner product as the distance function and HNSW as the index type. You can explore other indexing strategies and distance functions on the pgvector documentation page.

Drizzle ORM also offers helpers to make similarity queries super clean:

import { innerProduct } from 'drizzle-orm';
 
const products = await this.db
  .select({
    similarity: innerProduct(Products.embedding, embedding),
  })
  .from(Products)
  .orderBy((t) => asc(t.similarity))
  .limit(5);

Just a heads-up: drizzle-kit doesn’t automatically enable the pgvector extension—you’ll need to enable it manually.


The world of data is constantly evolving, and sometimes, traditional relational databases struggle to keep pace. Vector data types and vector databases have emerged as a powerful solution for handling high-dimensional data, in applications like recommendation systems, image recognition, and natural language processing and etc. They allow you to store and manipulate multidimensional data points, offering a powerful way to represent complex relationships and perform similarity searches. I’ll try to cover how to use vector data types in PostgreSQL using pgvector extension and how Drizzle ORM can offer a very smooth Typescript experience.

Vectors and Vector DBs

Imagine data points not just as rows and columns, but as positions in a vast space. Vectors are essentially arrays of numbers that can represent data points in multi-dimensional space. These vectors are mathematical representations of complex data, such as text, images, or audio, in a high-dimensional space.

Vector databases, therefore, specialize in storing, indexing, and querying vector data. They use distance measures (like Euclidean distance or cosine similarity) to find similarities between vectors, enabling fast and efficient retrieval of similar items from a large dataset. This capability is crucial for implementing features like search-by-image, recommendations, or any application requiring similarity searches at scale.

Common use cases for vector databases include:

  • Recommendation Systems: Finding similar products, movies, or articles based on a user's past behavior.
  • Image and Text Search: Retrieving visually or semantically similar images or text snippets.
  • Fraud Detection: Identifying anomalous patterns in financial transactions.

pgvector and Drizzle ORM

While Postgres itself doesn't natively support vector data types, the pgvector extension bridges the gap. It adds the vector data type, allowing you to store multidimensional data points directly in your Postgres tables.

Drizzle ORM is a tool for TypeScript, designed to make it easier for developers to interact with databases. It provides a type-safe way to query and manipulate data in SQL databases, leveraging TypeScript's advanced type system for more reliable and maintainable code. I love how good their API is.

Starting with pgvector

There’s an extensive guide on how to enable install pgvector and use it. The easiest way is to use their Docker image and run this command to enable the extension:

CREATE EXTENSION vector;

Declaring schemas with Drizzle and connecting to the database is as easy as:

import { pgTable, serial, text } from "drizzle-orm/pg-core";
import { drizzle } from "drizzle-orm/node-postgres";
import { Client } from "pg";
 
const client = new Client({
  connectionString: "postgres://user:password@host:port/db",
});
 
export const users = pgTable('users', {
  id: serial('id').primaryKey(),
  fullName: text('full_name'),
});
 
await client.connect();
const db = drizzle(client, { users });

Combining the two

Since we’re talking about two of the latest tools, they don’t natively go together but there are 2 ways to make it work.

pgvector-node

pgvector team offers a list of solutions for different ORMs of Node.js (one being Drizzle ORM). Here’s an example of using their tool with Drizzle ORM:

// Adding vector field
import { vector } from 'pgvector/drizzle-orm';
 
export const users = pgTable('users', {
  id: serial('id').primaryKey(),
  fullName: text('full_name'),
  embedding: vector('embedding', { dimensions: 256 })
});
 
// Finding nearest neighbors to a vector
import { l2Distance } from 'pgvector/drizzle-orm';
 
const nearest = await db.select()
  .from(users)
  .orderBy(l2Distance(users.embedding, [1, 2, 3, ..., 256]))
  .limit(5);

They also provide maxInnerProduct and cosineDistnace functions for different ways of finding distance between vectors

Creative way

Drizzle ORM lets to define custom types for special use cases that you may have. Here I tried to create a custom type for Vectors:

import { customType } from 'drizzle-orm/pg-core';
 
 
export const vector = customType<{
	data: number[];
	driverData: string;
	config: { size: number };
}>({
	dataType(config) {
		const dt =
			!!config && typeof config.size === "number"
				? `vector(${config.size})`
				: "vector";
		return dt;
	},
	fromDriver(value) {
		return JSON.parse(value);
	},
	toDriver(value) {
		return JSON.stringify(value);
	},
});
 
// Adding it to schema
export const users = pgTable('users', {
  id: serial('id').primaryKey(),
  fullName: text('full_name'),
  embedding: vector('embedding', { size: 256 })
});
 
// Querying nearest users (cosine similarity)
const nearest = await db
	.select({
		id: users.id,
		distance: sql
			.raw(`1 - (${users.embedding.name} <=> '[${vector}]')`)
			.as<number>("distance"),
	})
	.from(user)
	.orderBy(sql`distance DESC`)
	.limit(5)