The Foundation of Vector Search in Postgres

You installed pgvector, embedded a few rows, ran a query with <=>, and the results looked right. It feels finished.

It isn't. You're standing at the start of the part nobody writes about: the gap between a query that returns something and a search feature you can put in front of users, debug when it's wrong, and keep fast as the table grows. Every tutorial stops at "it returns something." This course is the rest.

No setup code in this lesson. It builds the mental model the other seven stand on: what an embedding actually is, why one Postgres table can hold text and images in the same space, and the exact moment your prototype starts lying to you.

An embedding is just a point

An embedding model takes an input (a sentence, a paragraph, an image) and returns a fixed-length list of numbers. That list is a coordinate. It drops the input at one specific point in a space with hundreds or thousands of dimensions.

You can't picture a thousand dimensions and you don't need to. Everything that matters about vector search is already visible in two. Drag the query point and watch what it lands nearest:

textimage

nearest: photo: puppy · d=6.3

Drag the query point anywhere. Text and images share one space — the closest matches light up.

Two things are worth slowing down on.

First: nothing here knows what a cat is. The model was trained so that inputs humans treat as similar end up close together. Similarity isn't a property of your data. It's a property of the model that embedded it. Swap the model and every distance in that picture changes. (This is also why you can't mix vectors from two different models in one query. They live in different spaces.)

Second, and this is the part most tutorials skip entirely: the photo of the kitten and the text "a small cat" are sitting right next to each other. Different inputs, pixels and characters, same model, one space. That's not a party trick. It's the whole reason a multimodal model is worth using.

One space for text and images

This course uses Voyage's voyage-multimodal-3.5. You hand it text or an image, you get back a vector in the same coordinate space either way. One Postgres column, one index, one query, and it doesn't care whether the question was typed or screenshotted.

The call is deliberately boring. No SDK magic, just the documented endpoint:

type VoyagePiece =
  | { type: "text"; text: string }
  | { type: "image_url"; image_url: string };
 
async function embed(inputs: VoyagePiece[][]): Promise<number[][]> {
  const res = await fetch("https://api.voyageai.com/v1/multimodalembeddings", {
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.VOYAGE_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model: "voyage-multimodal-3.5",
      inputs: inputs.map((content) => ({ content })),
    }),
  });
 
  const json = (await res.json()) as { data: { embedding: number[] }[] };
  return json.data.map((d) => d.embedding);
}
 
// A sentence and an image, embedded into the *same* space.
const [textVector, imageVector] = await embed([
  [{ type: "text", text: "a small cat" }],
  [{ type: "image_url", image_url: screenshotUrl }],
]);

Where it lands is an ordinary column. If you've set up pgvector with Drizzle before this will look familiar (if not, I wrote the setup walkthrough separately):

import { pgTable, serial, text, vector, index } from "drizzle-orm/pg-core";
 
export const chunks = pgTable(
  "chunks",
  {
    id: serial("id").primaryKey(),
    source: text("source").notNull(),
    body: text("body").notNull(),
    // voyage-multimodal-3.5 returns 1024-dimensional vectors by default.
    embedding: vector("embedding", { dimensions: 1024 }),
  },
  (t) => [
    index("chunks_embedding_hnsw_idx").using(
      "hnsw",
      t.embedding.op("vector_cosine_ops"),
    ),
  ],
);

That 1024 is the first decision you can't cheaply walk back. voyage-multimodal-3.5 will also hand you 256, 512, or 2048 if you ask, but whatever you pick, every row has to agree on it. Changing your mind later means re-embedding the whole table. We deal with that properly in Lesson 8. For now just notice what you already have: text and images, retrievable together, in a table that also has source, timestamps, foreign keys, and every other thing Postgres has always given you. The vectors moved in next to your real data instead of into a second system. Keep that in mind, it's the through-line of the whole course.

"Closest" depends on how you measure

Once everything is a point, search is just "which points are nearest the query." But nearest isn't one thing. The same vectors rank differently depending on the metric. Switch between them:

1doc B1.00
2doc A1.00
3doc D0.88
4doc C0.76

top under Cosine: doc B · 1.00

Angle only — magnitude is ignored.

pgvector ships an operator for each: <=> for cosine distance, <-> for Euclidean (L2), <#> for negative inner product. They're not interchangeable. Cosine only cares about direction, so vector length is invisible to it. Inner product rewards longer vectors. With normalized embeddings some of these collapse to the same ordering, and knowing when they do is the difference between results that feel right and results you can't explain. I went deeper on what each operator actually means if you want the long version.

Your first real query, in Drizzle, is small:

import { cosineDistance, sql } from "drizzle-orm";
 
const similarity = sql<number>`1 - (${cosineDistance(
  chunks.embedding,
  queryVector,
)})`;
 
const results = await db
  .select({ body: chunks.body, similarity })
  .from(chunks)
  .orderBy((t) => sql`${t.similarity} DESC`)
  .limit(10);

cosineDistance returns a distance where 0 is identical, so 1 - distance flips it into a similarity where 1 is best. This is the query that "works" in every demo. It's also the query that's about to lie to you.

The point where the prototype starts lying

Here's the failure that sends people to this course.

The query above is honest. It compares the query against every row, exactly. At a few hundred rows that's instant and perfect, so you demo it, it looks great, and the plan ships.

Then the table grows. To stay fast you add an approximate index, HNSW or IVFFlat (that choice gets two whole lessons later, and I've written a shorter overview too). Approximate is the word doing the damage. The index stops checking every row and checks a neighborhood instead. Untuned, it starts skipping the right answers. Nothing throws. The query still returns ten rows. They're just quietly, partially wrong.

99%Full scan (exact)

99%ANN index (untuned)

At small scale, everything looks fine.

5005k50k500k

500 rows · ANN recall@10 ≈ 99%

Drag the dataset size. Same query, same index settings.

Drag it until the message changes. That gap between "looks fine" and "is missing results" is invisible unless you go measure it on purpose. Most teams never do, which is why a lot of vector search in production is subtly broken and nobody can put a number on how broken.

(Those numbers are illustrative. The real way to get them: run your query set through an exact scan to get the true neighbors, run the same set through the index, and diff the two. That's recall, and it needs no human labels. We build exactly that harness in Lesson 7.)

So here's the honest version tutorials skip: pgvector is not magic, and there is a point where a naive setup is wrong. The good news is that point is not "go buy a vector database." It's "go understand the tradeoffs." Postgres handles tens of millions of vectors comfortably once you know which knobs to turn, and you keep your joins, your filters, your transactions, and one system to operate at 2am instead of two. (The community decision tree keeps converging on the same line: pgvector until you have a specific reason not to.)

When Postgres is enough, and when it isn't

I'll steelman the other side, because the honest answer has an edge.

For a real slice of teams, a dedicated vector database is worth it. Hundreds of millions of vectors, a hard sub-10ms budget, relentless write-and-reindex churn: that's a real workload and a purpose-built engine earns its keep there.

But that's not most teams reaching for one. Most are running Postgres already, haven't tuned it at all, and are about to bolt on a second database to solve a problem the first one would handle fine. You don't need Pinecone to find out you needed an ef_search value. If your vectors belong next to relational data you already query, and your scale is millions to low tens of millions, the answer is the database you already operate. Fewer moving parts wins almost every time.

I've watched this play out enough times that the migration story is predictable: team picks a dedicated vector DB before they've measured anything, spends a quarter on integration, then quietly moves back to Postgres when the spec turns out to be 200k rows and a metadata filter the other system was bad at anyway. (One of the more honest takes I've seen on r/Rag was someone running 180k docs on a managed vector store and transitioning to pgvector because the dedicated store wasn't earning it.)

And if you do outgrow vanilla pgvector before you outgrow Postgres, the next step isn't a different database. It's pgvectorscale, Timescale's companion extension that pushes the index out further on the same stack. We'll come back to it in Lesson 8.

What the rest of the course does

From here on, every lesson takes one of these stuck points and turns it into a decision you can make in your own codebase, against one continuous TypeScript project: a multimodal docs assistant that retrieves across text and screenshots.

1. Setup (Lesson 2). Voyage, Drizzle, and a Postgres you self-host and control down to the build flags.

2. Relevance (Lesson 3). Chunking real documents and debugging results that look plausible and are wrong.

3. Speed (Lessons 4 and 5). Choosing and building the index, then tuning Postgres and pgvector until recall and latency are both yours. This is the part almost nobody writes down.

4. Retrieval (Lesson 6). Hybrid search, reranking, and the RAG pattern done properly instead of done once. The shape of that pipeline exists as a post already; the lesson is where we make it good.

5. Evals (Lesson 7). Proving the system is good with a number, and catching the regression before your users do.

6. Operating it (Lesson 8). Zero-downtime re-embedding when the model changes, and watching the drift that quietly rots search over months.

You've got the model the rest of it stands on: embeddings are points, one multimodal model puts text and images in the same space, "nearest" depends on the metric, and an untuned index lies without ever raising its voice.

None of this requires a new vendor, a second datastore, or a dashboard you check at 2am. It's the database you already trust, doing one more thing well. That's the version of vector search I want more teams to have: boring, inspectable, and yours. Lesson 2 turns it on.