Vector Databases — pgvector, Pinecone, Qdrant or Weaviate? How to Choose for RAG and AI Memory
A vector database is a store that holds texts (documents, knowledge chunks, agent memory) as embeddings — vectors of numbers — and instantly finds the ones most semantically similar to a query. It is the foundation of every RAG and AI memory system. Which one to choose? The rule by scale: up to ~10 million vectors the best choice is usually pgvector (an extension to the Postgres you probably already run) — vectors live next to your application data and you add no new infrastructure. Beyond 10M, or when vector search is your primary workload rather than a feature — move to a dedicated database: Qdrant for performance and filtering, Weaviate for hybrid search, Milvus for billion scale, Pinecone when you want zero infrastructure ops.
The complete guide to vector databases: what they are and why you need them for RAG, when pgvector in Postgres is enough and when you need a dedicated database, a pgvector vs Pinecone vs Qdrant vs Weaviate vs Milvus comparison (latency, cost, scale), the difference between HNSW and IVFFlat indexes, deployment code in pgvector, hybrid search and metadata filtering, and common migration mistakes.
You build a RAG chatbot on company documentation. It works great on 50 documents in the prototype. You deploy it on 50 thousand — and suddenly search takes seconds, database costs grow faster than traffic, and filtering by department or date does not work the way you thought. The problem is not the model. The problem is the layer you chose hastily at the start: the vector database.
This choice comes up in almost every article in this series — in RAG, in agent memory, in self-hosting — but it has never been covered on its own. Time to fix that. This article shows how vector databases work, which to choose for your scale, how the indexes differ, what it really costs and how to deploy it all in practice — with code.
What a vector database is and why you need one
AI models do not understand text directly — they turn it into embeddings, vectors of a few hundred to a few thousand numbers that encode meaning. Sentences with similar meaning have vectors close together in space, even if they use completely different words. "What is the notice period" and "after how many months can I terminate the contract" land next to each other despite sharing no keyword.
A vector database does three things:
- Stores embeddings — vectors along with the original text and metadata (source, date, department, permissions)
- Searches by similarity — for a query turned into a vector, it finds the top-K nearest vectors in the database (similarity search), usually by cosine measure
- Filters by metadata — combines semantic search with hard conditions ("only documents from 2026", "only the HR department")
This is exactly the retrieval mechanism RAG stands on (described in the article on building a knowledge base) and the agent's semantic memory (the AI memory article). Without a vector database you would have to compare a query against every document one by one — impossible at thousands of documents. A vector database with an index does it in milliseconds.
pgvector vs a dedicated database — the first decision
/// CHOOSING A VECTOR DATABASE BY SCALE
Start with pgvector — move to a dedicated DB at scale
The most important decision is not "Pinecone or Qdrant", but "do I even need a dedicated vector database". For most companies the answer is: not yet. pgvector — an extension for PostgreSQL — lets you keep vectors in the database you probably already run and operate.
pgvector's biggest strength is not performance, it is integration: vectors live next to application data. A single SQL query combines semantic search with joins to user, order or permission tables. You do not sync two systems, do not maintain separate infrastructure, do not pay for another SaaS. For sets under 1M vectors pgvector with an HNSW index matches dedicated databases, and with proper tuning it handles up to ~10M.
A dedicated vector database starts paying off when:
- You exceed ~10M vectors — at that scale dedicated engines (Qdrant, Milvus) win on performance by an order of magnitude
- Vector search is your primary workload — not an add-on to the app but its core; then a specialized tool is worth it
- You need features pgvector lacks natively — advanced hybrid search, horizontal scaling (sharding), GPU, vector quantization
- You have very high QPS — thousands of queries per second at high recall
The practical rule: start with pgvector, migrate to a dedicated database only when a specific limitation forces you to. Premature migration is the classic example of solving a problem you do not have yet.
Comparison: pgvector, Pinecone, Qdrant, Weaviate, Milvus
/// PGVECTOR vs QDRANT vs PINECONE vs WEAVIATE vs MILVUS
Once you have decided you need a dedicated database (or you are comparing deliberately), here is how the main players stack up in 2026:
| Database | Type | Latency p50 | Strength | Best for |
|---|---|---|---|---|
| pgvector | Postgres extension | Good to 1M | Vectors next to data, no new infra | Most companies, < 10M vectors |
| Qdrant | Dedicated (Rust) | ~4 ms | Lowest latency, filtering, free tier | Performance and metadata filtering |
| Pinecone | Serverless SaaS | < 10 ms | Easiest to operate, zero DevOps | Startups, when velocity > cost |
| Weaviate | Dedicated (Go) | Good | Best hybrid search | Vector + keyword together |
| Milvus | Dedicated, GPU | ~6 ms | Billion scale, sharding | Very large datasets (100M+) |
How to read this:
- Pinecone is the easiest to operate and pays off when development velocity outpaces infrastructure cost — typical for startups. At enterprise scale it is hard to justify: the same workload on self-hosted Qdrant or Milvus costs 5–10× less
- Qdrant offers the lowest latency (~4 ms p50) and the best free tier; great when metadata filtering during search matters
- Weaviate has the best hybrid search — combining vector similarity with keyword matching in one query
- Milvus is the choice for billion scale: GPU acceleration, sharding, extreme throughput — but it demands the most DevOps knowledge
HNSW vs IVFFlat indexes — which to choose
The database itself is not everything — the index is key, deciding the trade-off between speed, accuracy (recall) and memory use. In pgvector (and most databases) you have two main options:
| Trait | HNSW | IVFFlat |
|---|---|---|
| Query speed | Higher, scales logarithmically | Lower at high recall |
| Accuracy (recall) | Better speed/recall trade-off | Weaker trade-off |
| Index build | Slower, more memory | Faster, less memory |
| Empty table | Works immediately | Needs data before building |
| Updates | Handles well | Worse with frequent changes |
| When to use | Default choice up to ~1M rows | Large, rarely-changed batch-loaded sets |
Practical guidance:
- HNSW is the safe default for applications under 1M rows — better speed/recall trade-off, works on an empty table, handles updates well
- Consider IVFFlat at scale, when you have millions of rows loaded in bulk and care more about fast index builds and lower memory than maximum query speed
- Critical production trap: an IVFFlat index does not belong in a schema migration — it belongs in scripts run after the data load. IVFFlat must "learn" cluster centers from existing data; built on an empty table it gives terrible recall
What it really costs
Cost varies dramatically with scale and hosting model. Approximate figures for typical configurations:
| Scale | pgvector (RDS) | Qdrant Cloud | Pinecone Serverless | Weaviate Cloud |
|---|---|---|---|---|
| 10M vectors | ~$45/mo | ~$65/mo | ~$70/mo | ~$135/mo |
| 100M vectors | < $100/mo (self-host) | moderate | $700+/mo | high |
The key observation: at 10M vectors the differences are small (tens of dollars), but at 100M the gap opens up. Pinecone can reach $700+/mo, while self-hosted Milvus or pgvector stays under $100. That is why managed SaaS pays off at the start (you save DevOps time while the team is small) and self-hosting wins at scale (you save money as traffic grows).
Watch the hidden costs of managed services — vector database bills regularly run 2.5–4× over budget when you do not watch embedding dimensionality, replica count and unused indexes. A 3072-dimension embedding takes twice the space of 1536 — at 100M vectors that is a real difference on the bill.
How to deploy — pgvector in practice
The simplest production setup: PostgreSQL with pgvector, a table with embeddings, an HNSW index and a similarity-search query with metadata filtering. The whole setup is a dozen-odd lines of SQL:
-- 1. Enable the extensionCREATE EXTENSION IF NOT EXISTS vector;-- 2. Table: text + embedding + metadata side by sideCREATE TABLE documents ( id bigserial PRIMARY KEY, content text NOT NULL, department text, created_at timestamptz DEFAULT now(), embedding vector(1536) -- dimension depends on the embedding model);-- 3. HNSW index (the default choice up to ~1M rows)CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);-- 4. Semantic search + metadata filter in one query-- $1 = the query vector from the embedding modelSELECT id, content, 1 - (embedding <=> $1) AS similarityFROM documentsWHERE department = 'HR' -- hard metadata filter AND created_at > now() - interval '1 year'ORDER BY embedding <=> $1 -- <=> is cosine distanceLIMIT 5;Three things worth noting:- **The <=> operator is cosine distance** — pgvector also has <-> (Euclidean) and <#> (inner product); for text embeddings you usually use cosine- **The WHERE filter works together with the search** — this is pgvector's edge: you combine semantics with hard SQL conditions without juggling two systems- **vector(1536) is the embedding dimension** — it must match the model (OpenAI text-embedding-3-small: 1536, large: 3072); changing the model means rebuilding the column
You generate vectors separately — by calling an embedding model (OpenAI, Cohere, or a local one when self-hosting from the Ollama/vLLM article) — and insert them into the embedding column. The rest is plain SQL.
Hybrid search and metadata filtering
Pure vector search has a weak spot: it misses exact matches. When a user searches for a specific contract number "ORD-2026-0042" or a proper name, semantics do not help — you need a keyword match. The solution is hybrid search: combining vector search (meaning) with full-text or BM25 search (exact words), fusing results with an algorithm like Reciprocal Rank Fusion.
- Weaviate has the best native hybrid search among the compared databases — it is its main advantage
- Qdrant supports hybrid search via sparse + dense vectors
- pgvector you combine with Postgres's built-in full-text search (tsvector) — it works, but you fuse results manually
- Metadata filtering is a separate, equally important feature — restricting search to documents of a given department, date or permission level; critical for security (a user should not get a document they have no access to in the results)
I wrote more on hybrid search and reranking in the advanced RAG article — that is where these techniques give the biggest jump in search quality.
Common mistakes and migration
- Premature migration to a dedicated database — standing up Pinecone/Qdrant for 50 thousand vectors that would comfortably fit in pgvector; you add infrastructure and cost with no benefit
- IVFFlat on an empty table — an index built before loading data gives terrible recall; build IVFFlat after inserting data, in a post-load script, not in a migration
- Mismatched embedding dimension — a vector(1536) column and a model returning 3072 dimensions is an insert error; changing the embedding model requires rebuilding the column and reindexing
- Ignoring the cost of dimensionality — larger embeddings mean linearly larger memory and storage cost; at large scale consider a smaller embedding model or quantization
- No permission filtering — search returning documents without checking whether the user has access is a data breach; a permission-metadata filter is mandatory
- Mixing episodic and semantic memory in one index — described in the AI memory article; it degrades search quality
Migrating from pgvector to a dedicated database, when the time comes, is simple: you export vectors with metadata, import into the new database, repoint the retrieval layer in your app. Because you already have the embeddings computed, you do not regenerate them — you migrate the data, not recompute it.
Vector database selection and deployment checklist
- 1.Start with pgvector if you have < 10M vectors and already use Postgres — do not add infrastructure unnecessarily
- 2.Move to a dedicated database when you exceed ~10M vectors or vector search is your primary workload
- 3.Choose the dedicated one by need: Qdrant (performance, filtering), Weaviate (hybrid search), Milvus (100M+ scale), Pinecone (zero DevOps)
- 4.Use the HNSW index as the default up to ~1M rows — the best speed/recall trade-off
- 5.IVFFlat only for large, batch-loaded sets — and build it AFTER loading data, not in a schema migration
- 6.Match the column dimension to the embedding model (1536 / 3072) — a mismatch is an insert error
- 7.Plan cost by scale: managed at the start (time savings), self-host at scale (5–10× savings)
- 8.Watch hidden costs: embedding dimensionality, replicas, unused indexes — bills can run 2.5–4× over budget
- 9.Add permission-metadata filtering — a user must not get a document they have no access to in the results
- 10.Consider hybrid search if exact matches matter (numbers, proper names), not just semantics
- 11.Do not migrate prematurely — do it when a specific limitation forces you, not "just in case"
- 12.When migrating, export the computed embeddings — do not recompute them
Key takeaways
A vector database is the foundation of RAG and AI memory — it stores texts as embeddings and finds the most semantically similar. The most important decision is not "which provider", but "do I even need a dedicated database": for most companies and sets under 10M vectors pgvector in Postgres is the best start, because vectors live next to application data. Choose a dedicated database at scale or for special requirements: Qdrant (performance, ~4 ms), Weaviate (hybrid search), Milvus (billion scale), Pinecone (zero DevOps). Use the HNSW index by default; IVFFlat only for large batch sets and never on an empty table. Plan cost by scale (managed at the start, self-host 5–10× cheaper at scale), match the embedding dimension to the model, add permission filtering and do not migrate prematurely.
---
I help companies choose, deploy and optimize a vector database for RAG and AI memory — from the pgvector vs dedicated decision, through index and embedding-model selection, to hybrid search, permission filtering and cost optimization. Get in touch — I start with a free 30-minute analysis of your use case.
/// RELATED_RECORDS
How AI Reads Invoices from Email and Enters Them into ERP
AI can automatically read an invoice from an email attachment — PDF, scan, or phone photo — and enter the data directly into an ERP system without any manual retyping. Full automation of cost invoice processing: from the mailbox to accounting.
Where to Start with AI Implementation in Your Company
AI implementation starts not with choosing a tool, but with identifying one repetitive process that wastes the most human time. Learn step by step how to select, map, and automate that process.
How to Build a Company Internal Knowledge Base with AI (RAG in Practice)
An internal knowledge base built on RAG lets you create your own company chatbot that answers only from your company's documents — not the model's guesses. Safe, up-to-date, precise AI with full control over your data.
Signal received?
Terminate
Silence
Initiate protocol. Establish connection. Let's build something loud.
