Most comparisons give you a feature table and send you off to "choose the right tool." You get throughput numbers and a managed-vs-self-hosted summary. This doesn't help when the real question is: do I add a second database to my stack, or do I make the one I'm already running do more?
If you already run Postgres, start with pgvector. It handles moderate scale, moderate QPS, strong filtering against your existing relational data, and consistency without any extra system. Move to Pinecone when vector search has genuinely become its own system - unpredictable growth, high-QPS filtered retrieval, multi-tenant isolation at scale, or when you'd rather pay someone else to tune it.
pgvector is a Postgres extension. Your vectors live in the same rows as your relational data, inside the same ACID transactions, queryable with the same SQL your app already writes. You can JOIN a similarity search against a users table and get consistent results without a sync job. Setting it up with Drizzle ORM is less than an hour of work.
Pinecone is a managed serverless vector database. You talk to it over HTTP, you sync vectors to it separately from your source-of-truth store, and it lives in its own failure domain. That's not inherently bad - it's just the tradeoff you're accepting. If something goes wrong, you have two systems to debug, not one.
That one difference - extension vs separate system - drives almost every other tradeoff in this comparison.
pgvector gives you two index types. Pinecone gives you a managed black box.
HNSW is the right default for most workloads. It builds a navigable small-world graph over your vectors. Recall is high, query latency is low, and you can keep inserting rows without rebuilding the index. The downside: it's memory-hungry. Budget roughly 1.5-2x the raw vector data size for the index in RAM. For 1536-dim embeddings at 1M rows, that's 6 GB of raw data plus another 9-12 GB for the HNSW graph.
IVFFlat clusters vectors into lists and scans the nearest nprobe lists at query time. It builds faster and uses less memory than HNSW, but recall depends heavily on nprobe, and you have to run ANALYZE after bulk inserts or query quality degrades. It also degrades more under high update rates because clusters drift. Use IVFFlat if you're RAM-constrained and the dataset is mostly static.
More on both index types and their tuning knobs at pgvector indexing.
Occasional notes on software, tools, and things I learn. No spam.
Unsubscribe anytime.
| Workload | pgvector | Pinecone | Why |
|---|---|---|---|
| ≤1M vectors | Win | Viable | pgvector fits in RAM easily; no extra system |
| 1M-10M vectors | Viable | Viable | Depends on RAM, QPS, and filter selectivity |
| 10M+ vectors | Stretch | Win | HNSW memory and Postgres vacuum pressure become real |
| High update rate | Viable | Win | Pinecone handles upserts without index drift |
| Selective metadata filters | Challenging | Win | pgvector filters post-index; Pinecone filters pre-fetch |
| Multi-tenant workloads | Viable | Win | Namespace isolation is simpler in Pinecone at scale |
| Relational joins / ACL logic | Win | Loses | SQL joins on existing tables; no sync required |
| Bursty traffic | Managed at DB level | Win | Pinecone scales horizontally with traffic; Postgres needs provisioning |
Most RAG apps don't do pure ANN search. They do filtered ANN search: find the top-k documents similar to this query, for this user, in this document type, from this time range. How each system handles that filter matters more than raw similarity latency.
pgvector's approximate indexes (HNSW and IVFFlat) scan the index first, then apply the WHERE clause. If your filter is highly selective - say, one tenant out of 10,000 - the index might return candidates that almost all fail the filter, forcing Postgres to scan far more of the graph than expected. Recall drops, latency climbs. You can add a B-tree index on the filter column, and Postgres will sometimes choose a partial scan strategy, but this is workload-specific and requires tuning.
Pinecone's metadata filtering is integrated into the retrieval step. It has purpose-built support for tenant-level namespaces and filter expressions that prune before candidate selection. For high-cardinality, highly selective filters, this is a genuine advantage.
This is the main thing pgvector articles handwave away. "Works at millions of rows" is true for pure similarity queries. Add a WHERE tenant_id = $1 AND document_type = 'contract' and the picture changes.
These numbers are rough estimates for orientation, not benchmarks. Vendor pricing changes; run your own math.
1M vectors, 1536 dimensions:
Raw vector storage: 1536 dims × 4 bytes = ~6 KB per row → ~6 GB for 1M rows. HNSW index adds roughly 9-12 GB. So you're looking at 15-18 GB of RAM to keep this hot. A db.r7g.large on RDS (15.25 GB RAM) costs around $175/month. You'll likely need db.r7g.xlarge (30.5 GB) at ~$350/month, depending on what else runs on the instance.
Pinecone at 1M vectors on the Starter plan is free up to a limit, then moves to a serverless model priced on Read Units and Write Units plus storage. At low-to-moderate query rates, 1M 1536-dim vectors runs roughly $70-120/month. Pinecone is cheaper at this scale if you're starting fresh and have no existing Postgres.
10M vectors, 1536 dimensions:
Raw storage: ~60 GB. HNSW overhead: 90-120 GB. You're now in db.r7g.4xlarge territory (~120 GB RAM, ~$1,400/month), and that's if vectors are the only heavy workload. You can offload to disk-based ANN with some recall loss, but you're firmly in "tune carefully" territory.
Pinecone at 10M vectors and moderate query rates runs $400-800/month depending on RU consumption. The managed overhead is real, but you get horizontal scale without capacity planning.
Be skeptical of any benchmark published by either vendor. They optimize for their own favorable workload.
pgvector inherits everything Postgres gives you: PITR, streaming replication, schema migrations, row-level security, connection pooling, and a monitoring ecosystem your team already knows. You don't add a new failure domain. You don't write a sync job. You don't manage two sets of API keys and two dashboards.
With Pinecone, someone has to keep the vectors in sync with your source-of-truth. When your app updates a document, you update Postgres and re-upsert to Pinecone. When a document is deleted, you delete from both. This sounds simple until you have a bug, a failed job, or a network partition and Pinecone shows results for documents that no longer exist. That's the two-systems tax - and it compounds as the team grows.
Pinecone now supports sparse+dense hybrid search (BM25-style sparse vectors combined with dense ANN). It's genuinely useful.
Postgres can do the same thing natively, and it's better for most RAG apps. You can combine a tsvector full-text search or pg_trgm similarity with a vector cosine query in a single SQL statement, joined against your actual relational data. No sync, no separate index pipeline. Tools like pg_search make BM25 over Postgres straightforward. The Postgres advantage here isn't just capability - it's that the hybrid search runs in the same transaction as your user, permission, and metadata tables. Building a full RAG pipeline with Drizzle and pgvector shows how this fits together.
Unpredictable growth. If you're shipping a product where vector search volume might 10x in a month, Pinecone's horizontal scaling removes a class of problems you don't want during a growth event.
High-QPS filtered retrieval. If your workload is thousands of filtered similarity queries per second with high filter selectivity, Pinecone's architecture handles this better than a single Postgres instance.
Multi-tenant isolation at scale. Pinecone's namespace model makes per-tenant isolation clean and operationally simple at large tenant counts. Postgres can do this with row-level security and partitioning, but it's more work to get right.
Teams that don't want to tune Postgres. HNSW parameters, work_mem, vacuum behavior, connection pooling - these are learnable, but they're also a real time investment. If the team's time is better spent elsewhere and the budget supports it, Pinecone removes that entire category of work.
These are the signals that mean it's time to move off pgvector:
Your p95 latency on filtered queries is missing the SLO and tuning hasn't helped. Your Postgres instance upgrades are driven primarily by vector RAM, not by your OLTP workload. The vector query load is causing autovacuum pressure or checkpoint storms that affect the rest of the app. Your per-tenant isolation requirements have outgrown what row-level security cleanly handles.
None of these happen at 100K vectors. Most of them don't happen at 1M. But if you're hitting them, Pinecone becomes rational - not because pgvector is broken, but because the vector search system has grown past what makes sense to run inside your application database.