RETURN_TO_BLOG
AI & Automation 15 min

Vector Databases — pgvector, Pinecone, Qdrant or Weaviate? How to Choose for RAG and AI Memory

A vector database is a store that holds texts (documents, knowledge chunks, agent memory) as embeddings — vectors of numbers — and instantly finds the ones most semantically similar to a query. It is the foundation of every RAG and AI memory system. Which one to choose? The rule by scale: up to ~10 million vectors the best choice is usually pgvector (an extension to the Postgres you probably already run) — vectors live next to your application data and you add no new infrastructure. Beyond 10M, or when vector search is your primary workload rather than a feature — move to a dedicated database: Qdrant for performance and filtering, Weaviate for hybrid search, Milvus for billion scale, Pinecone when you want zero infrastructure ops.

The complete guide to vector databases: what they are and why you need them for RAG, when pgvector in Postgres is enough and when you need a dedicated database, a pgvector vs Pinecone vs Qdrant vs Weaviate vs Milvus comparison (latency, cost, scale), the difference between HNSW and IVFFlat indexes, deployment code in pgvector, hybrid search and metadata filtering, and common migration mistakes.

You build a RAG chatbot on company documentation. It works great on 50 documents in the prototype. You deploy it on 50 thousand — and suddenly search takes seconds, database costs grow faster than traffic, and filtering by department or date does not work the way you thought. The problem is not the model. The problem is the layer you chose hastily at the start: the vector database.

This choice comes up in almost every article in this series — in RAG, in agent memory, in self-hosting — but it has never been covered on its own. Time to fix that. This article shows how vector databases work, which to choose for your scale, how the indexes differ, what it really costs and how to deploy it all in practice — with code.

What a vector database is and why you need one

AI models do not understand text directly — they turn it into embeddings, vectors of a few hundred to a few thousand numbers that encode meaning. Sentences with similar meaning have vectors close together in space, even if they use completely different words. "What is the notice period" and "after how many months can I terminate the contract" land next to each other despite sharing no keyword.

A vector database does three things:

  • Stores embeddings — vectors along with the original text and metadata (source, date, department, permissions)
  • Searches by similarity — for a query turned into a vector, it finds the top-K nearest vectors in the database (similarity search), usually by cosine measure
  • Filters by metadata — combines semantic search with hard conditions ("only documents from 2026", "only the HR department")

This is exactly the retrieval mechanism RAG stands on (described in the article on building a knowledge base) and the agent's semantic memory (the AI memory article). Without a vector database you would have to compare a query against every document one by one — impossible at thousands of documents. A vector database with an index does it in milliseconds.

pgvector vs a dedicated database — the first decision

/// CHOOSING A VECTOR DATABASE BY SCALE

Start with pgvector — move to a dedicated DB at scale

< 1M vectors
pgvector + HNSW
The Postgres you already run; vectors next to app data
1–10M vectors
pgvector or Qdrant
pgvector still fits; Qdrant when latency and filtering matter
10–100M vectors
Qdrant / Weaviate
Dedicated DB; hybrid search, horizontal scaling
> 100M vectors
Milvus / Qdrant
Billion scale, GPU, sharding; self-host 5–10× cheaper
10M
THRESHOLD TO CONSIDER A DEDICATED DB
5–10×
CHEAPER SELF-HOST VS MANAGED AT SCALE
HNSW
SAFE DEFAULT INDEX UP TO 1M ROWS

The most important decision is not "Pinecone or Qdrant", but "do I even need a dedicated vector database". For most companies the answer is: not yet. pgvector — an extension for PostgreSQL — lets you keep vectors in the database you probably already run and operate.

pgvector's biggest strength is not performance, it is integration: vectors live next to application data. A single SQL query combines semantic search with joins to user, order or permission tables. You do not sync two systems, do not maintain separate infrastructure, do not pay for another SaaS. For sets under 1M vectors pgvector with an HNSW index matches dedicated databases, and with proper tuning it handles up to ~10M.

A dedicated vector database starts paying off when:

  • You exceed ~10M vectors — at that scale dedicated engines (Qdrant, Milvus) win on performance by an order of magnitude
  • Vector search is your primary workload — not an add-on to the app but its core; then a specialized tool is worth it
  • You need features pgvector lacks natively — advanced hybrid search, horizontal scaling (sharding), GPU, vector quantization
  • You have very high QPS — thousands of queries per second at high recall

The practical rule: start with pgvector, migrate to a dedicated database only when a specific limitation forces you to. Premature migration is the classic example of solving a problem you do not have yet.

Comparison: pgvector, Pinecone, Qdrant, Weaviate, Milvus

/// PGVECTOR vs QDRANT vs PINECONE vs WEAVIATE vs MILVUS

pgvector
DEFAULT
TypePostgres extension
Latency p50~good to 1M
Cost 10M~$45/mo (RDS)
StrengthVectors next to data
Best forMost companies
Qdrant
PERFORMANCE
TypeDedicated (Rust)
Latency p50~4 ms
Cost 10M~$65/mo
StrengthFiltering, free tier
Best forPerformance + metadata
Pinecone
MANAGED
TypeServerless SaaS
Latency p50< 10 ms
Cost 10M~$70/mo
StrengthZero infra ops
Best forStartups, velocity
Weaviate
HYBRID
TypeDedicated (Go)
Latency p50~good
Cost 10M~$135/mo
StrengthBest hybrid search
Best forVector + keyword
Milvus
SCALE
TypeDedicated, GPU
Latency p50~6 ms
Cost 100M< $100 self-host
StrengthBillion scale
Best forVery large datasets
4 ms
BEST P50 LATENCY (QDRANT)
10×
COST GAP BETWEEN OPTIONS AT 100M
open
SOURCE — PGVECTOR QDRANT · WEAVIATE · MILVUS

Once you have decided you need a dedicated database (or you are comparing deliberately), here is how the main players stack up in 2026:

DatabaseTypeLatency p50StrengthBest for
pgvectorPostgres extensionGood to 1MVectors next to data, no new infraMost companies, < 10M vectors
QdrantDedicated (Rust)~4 msLowest latency, filtering, free tierPerformance and metadata filtering
PineconeServerless SaaS< 10 msEasiest to operate, zero DevOpsStartups, when velocity > cost
WeaviateDedicated (Go)GoodBest hybrid searchVector + keyword together
MilvusDedicated, GPU~6 msBillion scale, shardingVery large datasets (100M+)

How to read this:

  • Pinecone is the easiest to operate and pays off when development velocity outpaces infrastructure cost — typical for startups. At enterprise scale it is hard to justify: the same workload on self-hosted Qdrant or Milvus costs 5–10× less
  • Qdrant offers the lowest latency (~4 ms p50) and the best free tier; great when metadata filtering during search matters
  • Weaviate has the best hybrid search — combining vector similarity with keyword matching in one query
  • Milvus is the choice for billion scale: GPU acceleration, sharding, extreme throughput — but it demands the most DevOps knowledge

HNSW vs IVFFlat indexes — which to choose

The database itself is not everything — the index is key, deciding the trade-off between speed, accuracy (recall) and memory use. In pgvector (and most databases) you have two main options:

TraitHNSWIVFFlat
Query speedHigher, scales logarithmicallyLower at high recall
Accuracy (recall)Better speed/recall trade-offWeaker trade-off
Index buildSlower, more memoryFaster, less memory
Empty tableWorks immediatelyNeeds data before building
UpdatesHandles wellWorse with frequent changes
When to useDefault choice up to ~1M rowsLarge, rarely-changed batch-loaded sets

Practical guidance:

  • HNSW is the safe default for applications under 1M rows — better speed/recall trade-off, works on an empty table, handles updates well
  • Consider IVFFlat at scale, when you have millions of rows loaded in bulk and care more about fast index builds and lower memory than maximum query speed
  • Critical production trap: an IVFFlat index does not belong in a schema migration — it belongs in scripts run after the data load. IVFFlat must "learn" cluster centers from existing data; built on an empty table it gives terrible recall

What it really costs

Cost varies dramatically with scale and hosting model. Approximate figures for typical configurations:

Scalepgvector (RDS)Qdrant CloudPinecone ServerlessWeaviate Cloud
10M vectors~$45/mo~$65/mo~$70/mo~$135/mo
100M vectors< $100/mo (self-host)moderate$700+/mohigh

The key observation: at 10M vectors the differences are small (tens of dollars), but at 100M the gap opens up. Pinecone can reach $700+/mo, while self-hosted Milvus or pgvector stays under $100. That is why managed SaaS pays off at the start (you save DevOps time while the team is small) and self-hosting wins at scale (you save money as traffic grows).

Watch the hidden costs of managed services — vector database bills regularly run 2.5–4× over budget when you do not watch embedding dimensionality, replica count and unused indexes. A 3072-dimension embedding takes twice the space of 1536 — at 100M vectors that is a real difference on the bill.

How to deploy — pgvector in practice

The simplest production setup: PostgreSQL with pgvector, a table with embeddings, an HNSW index and a similarity-search query with metadata filtering. The whole setup is a dozen-odd lines of SQL:

setup_pgvector.sql
-- 1. Enable the extensionCREATE EXTENSION IF NOT EXISTS vector;-- 2. Table: text + embedding + metadata side by sideCREATE TABLE documents (    id          bigserial PRIMARY KEY,    content     text NOT NULL,    department  text,    created_at  timestamptz DEFAULT now(),    embedding   vector(1536)   -- dimension depends on the embedding model);-- 3. HNSW index (the default choice up to ~1M rows)CREATE INDEX ON documents    USING hnsw (embedding vector_cosine_ops);-- 4. Semantic search + metadata filter in one query--    $1 = the query vector from the embedding modelSELECT id, content, 1 - (embedding <=> $1) AS similarityFROM documentsWHERE department = 'HR'               -- hard metadata filter  AND created_at > now() - interval '1 year'ORDER BY embedding <=> $1             -- <=> is cosine distanceLIMIT 5;Three things worth noting:- **The <=> operator is cosine distance** — pgvector also has <-> (Euclidean) and <#> (inner product); for text embeddings you usually use cosine- **The WHERE filter works together with the search** — this is pgvector's edge: you combine semantics with hard SQL conditions without juggling two systems- **vector(1536) is the embedding dimension** — it must match the model (OpenAI text-embedding-3-small: 1536, large: 3072); changing the model means rebuilding the column

You generate vectors separately — by calling an embedding model (OpenAI, Cohere, or a local one when self-hosting from the Ollama/vLLM article) — and insert them into the embedding column. The rest is plain SQL.

Hybrid search and metadata filtering

Pure vector search has a weak spot: it misses exact matches. When a user searches for a specific contract number "ORD-2026-0042" or a proper name, semantics do not help — you need a keyword match. The solution is hybrid search: combining vector search (meaning) with full-text or BM25 search (exact words), fusing results with an algorithm like Reciprocal Rank Fusion.

  • Weaviate has the best native hybrid search among the compared databases — it is its main advantage
  • Qdrant supports hybrid search via sparse + dense vectors
  • pgvector you combine with Postgres's built-in full-text search (tsvector) — it works, but you fuse results manually
  • Metadata filtering is a separate, equally important feature — restricting search to documents of a given department, date or permission level; critical for security (a user should not get a document they have no access to in the results)

I wrote more on hybrid search and reranking in the advanced RAG article — that is where these techniques give the biggest jump in search quality.

Common mistakes and migration

  • Premature migration to a dedicated database — standing up Pinecone/Qdrant for 50 thousand vectors that would comfortably fit in pgvector; you add infrastructure and cost with no benefit
  • IVFFlat on an empty table — an index built before loading data gives terrible recall; build IVFFlat after inserting data, in a post-load script, not in a migration
  • Mismatched embedding dimension — a vector(1536) column and a model returning 3072 dimensions is an insert error; changing the embedding model requires rebuilding the column and reindexing
  • Ignoring the cost of dimensionality — larger embeddings mean linearly larger memory and storage cost; at large scale consider a smaller embedding model or quantization
  • No permission filtering — search returning documents without checking whether the user has access is a data breach; a permission-metadata filter is mandatory
  • Mixing episodic and semantic memory in one index — described in the AI memory article; it degrades search quality

Migrating from pgvector to a dedicated database, when the time comes, is simple: you export vectors with metadata, import into the new database, repoint the retrieval layer in your app. Because you already have the embeddings computed, you do not regenerate them — you migrate the data, not recompute it.

Vector database selection and deployment checklist

  1. 1.Start with pgvector if you have < 10M vectors and already use Postgres — do not add infrastructure unnecessarily
  2. 2.Move to a dedicated database when you exceed ~10M vectors or vector search is your primary workload
  3. 3.Choose the dedicated one by need: Qdrant (performance, filtering), Weaviate (hybrid search), Milvus (100M+ scale), Pinecone (zero DevOps)
  4. 4.Use the HNSW index as the default up to ~1M rows — the best speed/recall trade-off
  5. 5.IVFFlat only for large, batch-loaded sets — and build it AFTER loading data, not in a schema migration
  6. 6.Match the column dimension to the embedding model (1536 / 3072) — a mismatch is an insert error
  7. 7.Plan cost by scale: managed at the start (time savings), self-host at scale (5–10× savings)
  8. 8.Watch hidden costs: embedding dimensionality, replicas, unused indexes — bills can run 2.5–4× over budget
  9. 9.Add permission-metadata filtering — a user must not get a document they have no access to in the results
  10. 10.Consider hybrid search if exact matches matter (numbers, proper names), not just semantics
  11. 11.Do not migrate prematurely — do it when a specific limitation forces you, not "just in case"
  12. 12.When migrating, export the computed embeddings — do not recompute them

Key takeaways

A vector database is the foundation of RAG and AI memory — it stores texts as embeddings and finds the most semantically similar. The most important decision is not "which provider", but "do I even need a dedicated database": for most companies and sets under 10M vectors pgvector in Postgres is the best start, because vectors live next to application data. Choose a dedicated database at scale or for special requirements: Qdrant (performance, ~4 ms), Weaviate (hybrid search), Milvus (billion scale), Pinecone (zero DevOps). Use the HNSW index by default; IVFFlat only for large batch sets and never on an empty table. Plan cost by scale (managed at the start, self-host 5–10× cheaper at scale), match the embedding dimension to the model, add permission filtering and do not migrate prematurely.

---

I help companies choose, deploy and optimize a vector database for RAG and AI memory — from the pgvector vs dedicated decision, through index and embedding-model selection, to hybrid search, permission filtering and cost optimization. Get in touch — I start with a free 30-minute analysis of your use case.

/// AUTHOR
Paweł Wiszniewski – AI & Web Engineer

Paweł Wiszniewski

SEO & GEO Specialist & AI Engineer

SEO/GEO specialist (10 years) and AI engineer (3 years). I build search visibility, AI systems and automations that reduce costs and improve operational efficiency.

Signal received?

Terminate
Silence

Initiate protocol. Establish connection. Let's build something loud.

> WAITING_FOR_INPUT...