How is a vector database different from a regular database?

A regular (SQL) database searches by exact values and conditions — "find order number X", "customers from Warsaw". A vector database searches by semantic similarity — "find the document chunks whose meaning best matches this question", even if they share no keyword. Under the hood it stores texts as embeddings (vectors of numbers encoding meaning) and uses special indexes (HNSW, IVFFlat) to find the nearest vectors instantly. Importantly, you do not have to choose — pgvector adds vector search to PostgreSQL, so you get both worlds in one database. That is why pgvector is the best start for most companies.

Is pgvector enough, or do I need Pinecone or Qdrant?

For most companies pgvector is enough — with plenty of headroom. Its biggest strength is not performance but integration: vectors live next to application data, so one SQL query combines semantic search with user, order or permission data, without maintaining a second system. Below 1M vectors pgvector with an HNSW index matches dedicated databases, and with tuning it handles up to ~10M. Choose a dedicated database (Qdrant, Weaviate, Milvus, Pinecone) when you exceed ~10M vectors, when vector search is your app's primary workload, or when you need features beyond pgvector — advanced hybrid search, sharding or GPU. The rule: start with pgvector, migrate when a specific limitation forces you.

HNSW or IVFFlat — which index should I choose?

HNSW is the safe default for applications under 1M rows: better speed/recall trade-off, scales logarithmically, works on an empty table and handles updates well. Its cost is slower index builds and more memory. Consider IVFFlat at scale — when you have millions of rows loaded in bulk and care more about fast builds and lower memory than maximum query speed. The critical trap: IVFFlat must "learn" cluster centers from data, so never build it on an empty table or in a schema migration — build it in a script run AFTER loading the data. Built on an empty table it gives terrible recall. For 90% of deployments the answer is: HNSW.

How much does a vector database cost in production?

It depends on scale and hosting model. At 10M vectors the differences are small: pgvector on RDS ~$45/mo, Qdrant Cloud ~$65, Pinecone Serverless ~$70, Weaviate Cloud ~$135. At 100M vectors the gap opens up: Pinecone can reach $700+/mo, while self-hosted Milvus or pgvector stays under $100. That is why managed SaaS pays off at the start (saving DevOps time with a small team) and self-hosting at scale (saving 5–10× at high traffic). Watch hidden costs: bills regularly run 2.5–4× over budget when you do not watch embedding dimensionality, replica count and unused indexes. A 3072-dimension embedding costs twice as much as 1536.

What is hybrid search and when do I need it?

Hybrid search combines vector search (by meaning) with keyword search (by exact match), usually fusing results with the Reciprocal Rank Fusion algorithm. You need it when queries contain exact matches that semantics miss: contract numbers (ORD-2026-0042), proper names, product codes, technical terms. Pure vectors understand meaning well but hit a specific string poorly. Among the databases, Weaviate has the best native hybrid search; Qdrant supports it via sparse + dense vectors; in pgvector you combine vector search with Postgres's built-in full-text search but fuse results manually. If your users mostly search for concepts and questions — pure vectors are enough; if they mix concepts with specific identifiers — add hybrid search.

How do I migrate from one vector database to another?

Simpler than it seems, because you already have the embeddings computed. The process: (1) export vectors along with the original text and metadata from the old database; (2) import them into the new one (most databases have import tools or batch APIs); (3) build the index on the new database — for IVFFlat only after loading data; (4) repoint the retrieval layer in your app to the new endpoint. Key: you do not recompute embeddings — you migrate ready vectors, saving time and embedding-model call costs. The only case where you must recompute: changing the embedding model (different dimension or vector space). Plan the migration with a window where both systems run in parallel, and switch traffic only after verifying recall on the new database.

Which embedding model should I choose for a vector database?

This is a separate decision from choosing the database, but related, because the model determines the column dimension and search quality. Popular 2026 choices: OpenAI text-embedding-3-small (1536 dimensions, cheap, a good default) and text-embedding-3-large (3072, better quality, twice the storage cost). For many languages, including Polish, multilingual models work well (e.g. Cohere embed-multilingual, BGE-M3). When self-hosting (the Ollama/vLLM article) you run a local embedding model and data never leaves your infrastructure. Rules: a larger dimension means better quality but linearly higher memory cost; test quality on your own data (the eval from the evaluation article), not on general benchmarks; and remember that changing the embedding model requires recomputing the whole database, so the initial choice matters.

RETURN_TO_BLOG

2026-06-16AI & Automation 15 min

Vector Databases — pgvector, Pinecone, Qdrant or Weaviate? How to Choose for RAG and AI Memory

Paweł Wiszniewski

SEO & GEO Specialist · AI Engineer

A vector database is a store that holds texts (documents, knowledge chunks, agent memory) as embeddings — vectors of numbers — and instantly finds the ones most semantically similar to a query. It is the foundation of every RAG and AI memory system. Which one to choose? The rule by scale: up to ~10 million vectors the best choice is usually pgvector (an extension to the Postgres you probably already run) — vectors live next to your application data and you add no new infrastructure. Beyond 10M, or when vector search is your primary workload rather than a feature — move to a dedicated database: Qdrant for performance and filtering, Weaviate for hybrid search, Milvus for billion scale, Pinecone when you want zero infrastructure ops.

The complete guide to vector databases: what they are and why you need them for RAG, when pgvector in Postgres is enough and when you need a dedicated database, a pgvector vs Pinecone vs Qdrant vs Weaviate vs Milvus comparison (latency, cost, scale), the difference between HNSW and IVFFlat indexes, deployment code in pgvector, hybrid search and metadata filtering, and common migration mistakes.

You build a RAG chatbot on company documentation. It works great on 50 documents in the prototype. You deploy it on 50 thousand — and suddenly search takes seconds, database costs grow faster than traffic, and filtering by department or date does not work the way you thought. The problem is not the model. The problem is the layer you chose hastily at the start: the vector database.

This choice comes up in almost every article in this series — in RAG, in agent memory, in self-hosting — but it has never been covered on its own. Time to fix that. This article shows how vector databases work, which to choose for your scale, how the indexes differ, what it really costs and how to deploy it all in practice — with code.

What a vector database is and why you need one

AI models do not understand text directly — they turn it into embeddings, vectors of a few hundred to a few thousand numbers that encode meaning. Sentences with similar meaning have vectors close together in space, even if they use completely different words. "What is the notice period" and "after how many months can I terminate the contract" land next to each other despite sharing no keyword.

A vector database does three things:

Stores embeddings — vectors along with the original text and metadata (source, date, department, permissions)
Searches by similarity — for a query turned into a vector, it finds the top-K nearest vectors in the database (similarity search), usually by cosine measure
Filters by metadata — combines semantic search with hard conditions ("only documents from 2026", "only the HR department")

This is exactly the retrieval mechanism RAG stands on (described in the article on building a knowledge base) and the agent's semantic memory (the AI memory article). Without a vector database you would have to compare a query against every document one by one — impossible at thousands of documents. A vector database with an index does it in milliseconds.

pgvector vs a dedicated database — the first decision

/// CHOOSING A VECTOR DATABASE BY SCALE

Start with pgvector — move to a dedicated DB at scale

< 1M vectors

→ pgvector + HNSW

The Postgres you already run; vectors next to app data

↓

1–10M vectors

→ pgvector or Qdrant

pgvector still fits; Qdrant when latency and filtering matter

↓

10–100M vectors

→ Qdrant / Weaviate

Dedicated DB; hybrid search, horizontal scaling

↓

> 100M vectors

→ Milvus / Qdrant

Billion scale, GPU, sharding; self-host 5–10× cheaper

10M

THRESHOLD TO CONSIDER A DEDICATED DB

5–10×

CHEAPER SELF-HOST VS MANAGED AT SCALE

HNSW

SAFE DEFAULT INDEX UP TO 1M ROWS

The most important decision is not "Pinecone or Qdrant", but "do I even need a dedicated vector database". For most companies the answer is: not yet. pgvector — an extension for PostgreSQL — lets you keep vectors in the database you probably already run and operate.

pgvector's biggest strength is not performance, it is integration: vectors live next to application data. A single SQL query combines semantic search with joins to user, order or permission tables. You do not sync two systems, do not maintain separate infrastructure, do not pay for another SaaS. For sets under 1M vectors pgvector with an HNSW index matches dedicated databases, and with proper tuning it handles up to ~10M.

A dedicated vector database starts paying off when:

You exceed ~10M vectors — at that scale dedicated engines (Qdrant, Milvus) win on performance by an order of magnitude
Vector search is your primary workload — not an add-on to the app but its core; then a specialized tool is worth it
You need features pgvector lacks natively — advanced hybrid search, horizontal scaling (sharding), GPU, vector quantization
You have very high QPS — thousands of queries per second at high recall

The practical rule: start with pgvector, migrate to a dedicated database only when a specific limitation forces you to. Premature migration is the classic example of solving a problem you do not have yet.

Comparison: pgvector, Pinecone, Qdrant, Weaviate, Milvus

/// PGVECTOR vs QDRANT vs PINECONE vs WEAVIATE vs MILVUS

pgvector

DEFAULT

TypePostgres extension

Latency p50~good to 1M

Cost 10M~$45/mo (RDS)

StrengthVectors next to data

Best forMost companies

Qdrant

PERFORMANCE

TypeDedicated (Rust)

Latency p50~4 ms

Cost 10M~$65/mo

StrengthFiltering, free tier

Best forPerformance + metadata

Pinecone

MANAGED

TypeServerless SaaS

Latency p50< 10 ms

Cost 10M~$70/mo

StrengthZero infra ops

Best forStartups, velocity

Weaviate

HYBRID

TypeDedicated (Go)

Latency p50~good

Cost 10M~$135/mo

StrengthBest hybrid search

Best forVector + keyword

Milvus

SCALE

TypeDedicated, GPU

Latency p50~6 ms

Cost 100M< $100 self-host

StrengthBillion scale

Best forVery large datasets

4 ms

BEST P50 LATENCY (QDRANT)

10×

COST GAP BETWEEN OPTIONS AT 100M

open

SOURCE — PGVECTOR QDRANT · WEAVIATE · MILVUS

Once you have decided you need a dedicated database (or you are comparing deliberately), here is how the main players stack up in 2026:

Database	Type	Latency p50	Strength	Best for
pgvector	Postgres extension	Good to 1M	Vectors next to data, no new infra	Most companies, < 10M vectors
Qdrant	Dedicated (Rust)	~4 ms	Lowest latency, filtering, free tier	Performance and metadata filtering
Pinecone	Serverless SaaS	< 10 ms	Easiest to operate, zero DevOps	Startups, when velocity > cost
Weaviate	Dedicated (Go)	Good	Best hybrid search	Vector + keyword together
Milvus	Dedicated, GPU	~6 ms	Billion scale, sharding	Very large datasets (100M+)

How to read this:

Pinecone is the easiest to operate and pays off when development velocity outpaces infrastructure cost — typical for startups. At enterprise scale it is hard to justify: the same workload on self-hosted Qdrant or Milvus costs 5–10× less
Qdrant offers the lowest latency (~4 ms p50) and the best free tier; great when metadata filtering during search matters
Weaviate has the best hybrid search — combining vector similarity with keyword matching in one query
Milvus is the choice for billion scale: GPU acceleration, sharding, extreme throughput — but it demands the most DevOps knowledge

HNSW vs IVFFlat indexes — which to choose

The database itself is not everything — the index is key, deciding the trade-off between speed, accuracy (recall) and memory use. In pgvector (and most databases) you have two main options:

Trait	HNSW	IVFFlat
Query speed	Higher, scales logarithmically	Lower at high recall
Accuracy (recall)	Better speed/recall trade-off	Weaker trade-off
Index build	Slower, more memory	Faster, less memory
Empty table	Works immediately	Needs data before building
Updates	Handles well	Worse with frequent changes
When to use	Default choice up to ~1M rows	Large, rarely-changed batch-loaded sets

Practical guidance:

HNSW is the safe default for applications under 1M rows — better speed/recall trade-off, works on an empty table, handles updates well
Consider IVFFlat at scale, when you have millions of rows loaded in bulk and care more about fast index builds and lower memory than maximum query speed
Critical production trap: an IVFFlat index does not belong in a schema migration — it belongs in scripts run after the data load. IVFFlat must "learn" cluster centers from existing data; built on an empty table it gives terrible recall

What it really costs

Cost varies dramatically with scale and hosting model. Approximate figures for typical configurations:

Scale	pgvector (RDS)	Qdrant Cloud	Pinecone Serverless	Weaviate Cloud
10M vectors	~$45/mo	~$65/mo	~$70/mo	~$135/mo
100M vectors	< $100/mo (self-host)	moderate	$700+/mo	high

The key observation: at 10M vectors the differences are small (tens of dollars), but at 100M the gap opens up. Pinecone can reach $700+/mo, while self-hosted Milvus or pgvector stays under $100. That is why managed SaaS pays off at the start (you save DevOps time while the team is small) and self-hosting wins at scale (you save money as traffic grows).

Watch the hidden costs of managed services — vector database bills regularly run 2.5–4× over budget when you do not watch embedding dimensionality, replica count and unused indexes. A 3072-dimension embedding takes twice the space of 1536 — at 100M vectors that is a real difference on the bill.

How to deploy — pgvector in practice

The simplest production setup: PostgreSQL with pgvector, a table with embeddings, an HNSW index and a similarity-search query with metadata filtering. The whole setup is a dozen-odd lines of SQL:

setup_pgvector.sql

-- 1. Enable the extensionCREATE EXTENSION IF NOT EXISTS vector;-- 2. Table: text + embedding + metadata side by sideCREATE TABLE documents (    id          bigserial PRIMARY KEY,    content     text NOT NULL,    department  text,    created_at  timestamptz DEFAULT now(),    embedding   vector(1536)   -- dimension depends on the embedding model);-- 3. HNSW index (the default choice up to ~1M rows)CREATE INDEX ON documents    USING hnsw (embedding vector_cosine_ops);-- 4. Semantic search + metadata filter in one query--    $1 = the query vector from the embedding modelSELECT id, content, 1 - (embedding <=> $1) AS similarityFROM documentsWHERE department = 'HR'               -- hard metadata filter  AND created_at > now() - interval '1 year'ORDER BY embedding <=> $1             -- <=> is cosine distanceLIMIT 5;Three things worth noting:- **The <=> operator is cosine distance** — pgvector also has <-> (Euclidean) and <#> (inner product); for text embeddings you usually use cosine- **The WHERE filter works together with the search** — this is pgvector's edge: you combine semantics with hard SQL conditions without juggling two systems- **vector(1536) is the embedding dimension** — it must match the model (OpenAI text-embedding-3-small: 1536, large: 3072); changing the model means rebuilding the column

You generate vectors separately — by calling an embedding model (OpenAI, Cohere, or a local one when self-hosting from the Ollama/vLLM article) — and insert them into the embedding column. The rest is plain SQL.

Hybrid search and metadata filtering

Pure vector search has a weak spot: it misses exact matches. When a user searches for a specific contract number "ORD-2026-0042" or a proper name, semantics do not help — you need a keyword match. The solution is hybrid search: combining vector search (meaning) with full-text or BM25 search (exact words), fusing results with an algorithm like Reciprocal Rank Fusion.

Weaviate has the best native hybrid search among the compared databases — it is its main advantage
Qdrant supports hybrid search via sparse + dense vectors
pgvector you combine with Postgres's built-in full-text search (tsvector) — it works, but you fuse results manually
Metadata filtering is a separate, equally important feature — restricting search to documents of a given department, date or permission level; critical for security (a user should not get a document they have no access to in the results)

I wrote more on hybrid search and reranking in the advanced RAG article — that is where these techniques give the biggest jump in search quality.

Common mistakes and migration

Premature migration to a dedicated database — standing up Pinecone/Qdrant for 50 thousand vectors that would comfortably fit in pgvector; you add infrastructure and cost with no benefit
IVFFlat on an empty table — an index built before loading data gives terrible recall; build IVFFlat after inserting data, in a post-load script, not in a migration
Mismatched embedding dimension — a vector(1536) column and a model returning 3072 dimensions is an insert error; changing the embedding model requires rebuilding the column and reindexing
Ignoring the cost of dimensionality — larger embeddings mean linearly larger memory and storage cost; at large scale consider a smaller embedding model or quantization
No permission filtering — search returning documents without checking whether the user has access is a data breach; a permission-metadata filter is mandatory
Mixing episodic and semantic memory in one index — described in the AI memory article; it degrades search quality

Migrating from pgvector to a dedicated database, when the time comes, is simple: you export vectors with metadata, import into the new database, repoint the retrieval layer in your app. Because you already have the embeddings computed, you do not regenerate them — you migrate the data, not recompute it.

Vector database selection and deployment checklist

1.Start with pgvector if you have < 10M vectors and already use Postgres — do not add infrastructure unnecessarily
2.Move to a dedicated database when you exceed ~10M vectors or vector search is your primary workload
3.Choose the dedicated one by need: Qdrant (performance, filtering), Weaviate (hybrid search), Milvus (100M+ scale), Pinecone (zero DevOps)
4.Use the HNSW index as the default up to ~1M rows — the best speed/recall trade-off
5.IVFFlat only for large, batch-loaded sets — and build it AFTER loading data, not in a schema migration
6.Match the column dimension to the embedding model (1536 / 3072) — a mismatch is an insert error
7.Plan cost by scale: managed at the start (time savings), self-host at scale (5–10× savings)
8.Watch hidden costs: embedding dimensionality, replicas, unused indexes — bills can run 2.5–4× over budget
9.Add permission-metadata filtering — a user must not get a document they have no access to in the results
10.Consider hybrid search if exact matches matter (numbers, proper names), not just semantics
11.Do not migrate prematurely — do it when a specific limitation forces you, not "just in case"
12.When migrating, export the computed embeddings — do not recompute them

Key takeaways

A vector database is the foundation of RAG and AI memory — it stores texts as embeddings and finds the most semantically similar. The most important decision is not "which provider", but "do I even need a dedicated database": for most companies and sets under 10M vectors pgvector in Postgres is the best start, because vectors live next to application data. Choose a dedicated database at scale or for special requirements: Qdrant (performance, ~4 ms), Weaviate (hybrid search), Milvus (billion scale), Pinecone (zero DevOps). Use the HNSW index by default; IVFFlat only for large batch sets and never on an empty table. Plan cost by scale (managed at the start, self-host 5–10× cheaper at scale), match the embedding dimension to the model, add permission filtering and do not migrate prematurely.

---

I help companies choose, deploy and optimize a vector database for RAG and AI memory — from the pgvector vs dedicated decision, through index and embedding-model selection, to hybrid search, permission filtering and cost optimization. Get in touch — I start with a free 30-minute analysis of your use case.

/// RELATED_SERVICES

Need these concepts implemented? Explore the services related to this topic.

Service

AI & Automation

Virtual employees who never sleep. Autonomous agents and workflows.

View service Service

AI App Development

Custom AI software and AI-powered web applications. MVP development, full stack engineering, and AI systems programming from scratch to production.

View service

/// SOURCES

/// RELATED_RECORDS

AI & Automation

Vibe Coding: Complete Guide to AI Coding Tools 2026

Claude Code, Cursor, GitHub Copilot, Codex CLI, Gemini CLI, Lovable, Bolt.new — 60% of all new code worldwide is AI-generated (Gartner, 2026). A complete map of 11 vibe coding tools across 3 categories, with pricing, use cases, and a selection guide for businesses.

18 min

AI & Automation

AI Deep Research — How an Agent Searches the Web and Writes the Report Instead of Your Analyst

OpenAI Deep Research, Perplexity, and web-browsing agents are reshaping desk research: a report that takes an analyst 4–8 hours, an agent finishes in 5–20 minutes with source citations. I explain how these tools work, when they genuinely replace a human and when they don't, what ROI looks like, how to build your own research-automation pipeline, and when it makes sense to let the agent do it instead of an employee.

15 min

AI & Automation

AI in Recruitment and HR 2026 — CV Screening Automation, EU AI Act Obligations, and When AI Helps vs Hurts

AI cuts CV screening time by 75%, but recruitment systems are classified as high-risk AI under the EU AI Act — with a full compliance package: human oversight, transparency, technical documentation, EU database registration. I explain what AI in HR can safely do (screening as a filter, chatbot, onboarding), where the line is (autonomous decisions without a human), which tools work for SMEs, and how to avoid legal exposure.

17 min

/// AUTHOR

Paweł Wiszniewski

SEO & GEO Specialist & AI Engineer

SEO/GEO specialist (10 years) and AI engineer (3 years). I build search visibility, AI systems and automations that reduce costs and improve operational efficiency.

LinkedIn Facebook

Signal received?

Terminate
Silence

Initiate protocol. Establish connection. Let's build something loud.

> WAITING_FOR_INPUT...

BIAŁYSTOK, PL

+48 732 022 086 pawel.wiszniewski95@gmail.com

What a vector database is and why you need one

pgvector vs a dedicated database — the first decision

Start with pgvector — move to a dedicated DB at scale

Comparison: pgvector, Pinecone, Qdrant, Weaviate, Milvus

HNSW vs IVFFlat indexes — which to choose

What it really costs

How to deploy — pgvector in practice

Hybrid search and metadata filtering

Common mistakes and migration

Vector database selection and deployment checklist

Key takeaways

/// RELATED_SERVICES

AI & Automation

AI App Development

/// SOURCES

/// RELATED_RECORDS

Vibe Coding: Complete Guide to AI Coding Tools 2026

AI Deep Research — How an Agent Searches the Web and Writes the Report Instead of Your Analyst

AI in Recruitment and HR 2026 — CV Screening Automation, EU AI Act Obligations, and When AI Helps vs Hurts

Signal received?

TerminateSilence

Terminate
Silence