Can text-to-SQL accidentally delete or change data in the database?

No, if the system is built correctly. The standard is to connect to the database through an account with read-only privileges, ideally on a separate replica. The database will physically reject any command other than SELECT — even if the model generated an UPDATE or DELETE. On top of that, the query passes validation before execution. Safety doesn't depend on the model's "good manners", but on database-level permissions the model cannot bypass.

How is text-to-SQL different from RAG and a document chatbot?

They're two different technologies for two different types of data. RAG works on unstructured data — documents, PDFs, emails, a knowledge base — and answers with quotes from text. Text-to-SQL works on structured data in a database and answers with computed results: sums, averages, lists, trends. "What's our return policy?" is RAG. "How many returns did we have in March?" is text-to-SQL. Mature systems combine both approaches in a single interface.

What accuracy is realistic on my company's database?

It depends primarily on schema quality and whether a semantic layer was built. Naively connecting the model to a raw schema often gives 50–60% on a real company database. With dynamic few-shot, validation and a semantic layer, well-built systems reach over 90%. For comparison, on the BIRD benchmark top models reach ~82% and a human expert ~93%. The best way to know the real number for your company is a test on a set of your actual questions — that's what I do at the start of every deployment.

Does my data leave for OpenAI or another provider?

The schema (table and column names) and the question itself go to the cloud model — and in some architectures also the result, to phrase the answer. The data rows themselves don't have to leave your infrastructure if the answer is phrased locally or returned raw. For sensitive data I recommend a self-hosted model (e.g. open-source running on your own server) — then nothing goes out. It's an architectural decision we make at the start of the project based on data classification and GDPR.

Do I need a data warehouse to deploy this?

No. Text-to-SQL works on any SQL database: PostgreSQL, MySQL, MS SQL Server, and on the database under your ERP or CRM. A warehouse (BigQuery, Snowflake) helps at large scale and with many sources, but for most SMEs an existing production database with a read-only replica is enough. If your data is scattered across several systems it's sometimes worth consolidating first — but that's a separate, optional step, not an entry requirement.

What if the model returns an answer that looks good but is wrong?

That's a real risk and the single most important reason architecture matters. The safeguards I use: query validation before execution, a self-correction loop after errors, a semantic layer that limits guessing, and — on sensitive questions — showing the generated SQL for approval (human-in-the-loop). For high-stakes decisions we never rely on the natural-language answer alone without the ability to verify how it was produced. 90%+ accuracy is great for self-service analytics, but critical financial reports still require verification.

RETURN_TO_BLOG

2026-06-25Updated: 2026-06-25AI & Automation 12 min

Talk to Your Database — Text-to-SQL, the AI That Turns Questions into SQL Queries

Text-to-SQL (also called NL2SQL or conversational analytics) is a technique where you ask a question in natural language — plain English, an ordinary sentence — and an AI system translates it into a SQL query, runs it against your database, and returns the answer. Instead of waiting two days for an analyst to write a query and build a report, you ask "how many customers haven't placed an order in three months?" and get the list in seconds. This isn't ChatGPT hard-wired to a database — it's an architecture where three things are critical: context (the model has to understand your schema and business terms), safety (read-only, never write), and verification (the query is checked before it runs). A well-built system reaches over 90% accuracy. A naive one — wiring the model straight to the database — fails spectacularly.

Your data sits in a database, ERP or warehouse — but to answer a simple business question, someone has to write SQL or build a report. Text-to-SQL flips that around: you ask in plain English, the AI generates the query, runs it read-only and returns the answer with a chart. I explain how it works, why the naive approach fails, how a semantic layer pushes accuracy from 50% to over 90%, and how to build it safely.

The most common scene on a consultation: a business owner shows me a dashboard in their ERP and says "all my data is here, but when I want to answer a specific question I have to ask IT, or export to Excel and calculate by hand". The data exists. The problem is access to it. Every non-standard question requires either a pre-built report (which doesn't exist) or someone who knows SQL (who is busy). Text-to-SQL solves exactly that gap.

In this article I explain how the technology works under the hood, where it genuinely delivers, what its limits are, and how to build it so it doesn't end in a data disaster. No sugar-coating — including when it's NOT worth it.

The problem it actually solves

A company's operational data lives in databases: PostgreSQL, MySQL, MS SQL Server, a warehouse like BigQuery or Snowflake, or the database under your ERP (Optima, Subiekt, SAP). The classic path to an answer looks like this: a business user has a question → they raise it with IT or an analyst → someone writes a SQL query → builds a report → sends back the result. This loop takes hours to days, and for unusual questions it often dies at "we don't have a report for that".

The result: 80% of a company's data is practically inaccessible to the people making decisions. Not because the data doesn't exist — because the technical barrier (knowing SQL and the database structure) cuts everyone off from it except a narrow group.

Text-to-SQL removes that barrier. A question asked in natural language becomes a database query. Someone in sales, finance, or management asks directly and gets an answer — no intermediary.

How it works under the hood — the question-to-answer loop

/// TEXT-TO-SQL: THE QUESTION-TO-ANSWER LOOP

Question in plain English

"How much did we earn from client X in Q1?"

↓

Context + schema

Schema RAG, semantic layer, few-shot examples

↓

LLM generates SQL

Model writes a query in your DB dialect

↓

Validation

SELECT-only, row limit, syntax check

↓

Read-only execution

On a replica — zero write risk

↓

Answer + chart

Result translated back to natural language

* On execution error, step 04→03 closes a self-correction loop: the model sees the error message and fixes the query.

Contrary to appearances, this is not "send the question to GPT and run whatever it returns". A production text-to-SQL system is a pipeline with several layers:

1. Understanding the question and pulling context. The system doesn't send the whole database to the model — that would be too expensive and ineffective. Instead it uses schema RAG: based on the question, it selects only the tables and columns that matter. It adds examples of similar queries from history (few-shot) and definitions of business terms from a semantic layer.

2. Generating SQL. The language model (GPT-4o, Claude, or an open-source model) writes the query in your database dialect — because SQL in PostgreSQL differs from SQL in MS SQL Server. This is the step where the context from point one decides everything.

3. Validation before execution. Before anything touches the database, the query is checked: is it definitely SELECT-only (no UPDATE/DELETE/DROP)? Does it have a row limit? Is the syntax valid? Does it reference columns the user isn't allowed to see?

4. Read-only execution. The query runs against a database replica with read-only privileges. Even if the model hallucinated a destructive command, the database physically won't let it run.

5. Translating the result. The raw result (a table of numbers) goes back to the model, which phrases the answer in natural language and — if useful — suggests a chart.

The self-correction loop. If a query returns an execution error, the system shows the model the error message and asks for a fix. That single loop can raise accuracy by double-digit percentage points — because the model learns from its own mistake in real time.

Why the naive approach fails — and what actually drives accuracy

/// WHAT DRIVES TEXT-TO-SQL ACCURACY

Naive: LLM + raw schema~50%

Model guesses column meaning and relations

+ Dynamic few-shot~70%

Examples of similar queries from history

+ Validation & self-correction~80%

Fix loop after an execution error

+ Semantic layer & glossary90%+

Business metrics & terms defined once

* Indicative figures. For reference: on the BIRD benchmark top models (GPT-4o) reach ~82%, human experts ~93%. Real company databases are often harder than the benchmark.

Here's the crux that most demos hide. Wiring the model straight to the database ("here's the schema, here's the question, write SQL") works great on a simple database with three tables and readable names. On a real company database — with a table named tdok, a column knt_gidnumer, and three different fields that could each mean "amount" — the naive model guesses. And when it guesses, it returns an answer that looks correct but is wrong. That's the most dangerous outcome: not an error, but plausible-looking falsehood.

What genuinely drives accuracy, from cheapest to most valuable:

Dynamic few-shot — attaching examples of similar, proven queries from history to the prompt. The model sees the pattern "this is how we calculate revenue here" and imitates it.
Validation and self-correction — a loop where a faulty query is fixed based on the database's error message.
Semantic layer — this is the game-changer. Instead of letting the model guess every time what an "active customer" or "margin" is, you define those terms ONCE, in a business glossary. The model no longer translates the raw schema — it translates the question into defined metrics. Research shows up to a 3x accuracy gain when grounding the model in a business glossary rather than schema alone.

For the record — benchmarks: on the BIRD dataset (realistic, hard databases) top models like GPT-4o reach around 82% accuracy, while a human expert reaches ~93%. That shows two things at once: the technology is already very useful, but not infallible. Which is why the architecture around the model (context, validation, semantic layer) decides whether in your company this is 55% or 92%.

Safety — the most important chapter everyone skips

Letting a language model touch a production database introduces two distinct risks that must be closed with architecture, not a promise.

Write risk. A hallucinated UPDATE or DELETE could corrupt live data. The fix isn't soft ("we'll ask the model not to do that") — it's hard: the system connects to the database through an account with read-only privileges, ideally on a separate replica. The database will physically reject any command other than SELECT. It's a layer the model can't bypass, because it doesn't depend on the model.

Data exposure risk. A query could return rows or columns a given user shouldn't see — salaries, sensitive data, other departments' data. The fix: row-level and column-level security at the database layer, plus mapping the app user's permissions onto database permissions. A salesperson asking about "revenue" sees their regions, not the whole company.

On top of that: a full audit trail. Every generated query and every result is logged. Without enforced read-only and without auditing, most NL2SQL prototypes fail a compliance review — and rightly so.

Security layer	What it does	Why it's essential
Read-only account + replica	Physically blocks writes and offloads production	A hallucinated UPDATE/DELETE has no way to execute
Query validation	Allows only SELECT, enforces a row limit	Stops destructive and overly costly queries
Row/column-level security	Filters data to the user's permissions	A salesperson won't see salaries or another department's data
Audit log	Logs the question, SQL and result	Compliance, accountability, debugging wrong answers
Human-in-the-loop	Shows the generated SQL before running	Builds trust and catches errors on sensitive queries

What it looks like in practice — an example

Imagine a question from someone in sales: "show the ten customers with the biggest year-over-year drop in orders". The system pulls the schema of the orders and customers tables, adds the definition of "year-over-year drop" from the semantic layer, and generates the query:

generated-query.sql

-- Question: "show the 10 customers with the biggest year-over-year drop in orders"SELECT c.name AS customer,       SUM(CASE WHEN o.year = 2025 THEN o.value ELSE 0 END) AS revenue_2025,       SUM(CASE WHEN o.year = 2026 THEN o.value ELSE 0 END) AS revenue_2026,       SUM(CASE WHEN o.year = 2026 THEN o.value ELSE 0 END)         - SUM(CASE WHEN o.year = 2025 THEN o.value ELSE 0 END) AS changeFROM orders oJOIN customers c ON c.id = o.customer_idWHERE o.year IN (2025, 2026)GROUP BY c.nameHAVING SUM(CASE WHEN o.year = 2026 THEN o.value ELSE 0 END)     < SUM(CASE WHEN o.year = 2025 THEN o.value ELSE 0 END)ORDER BY change ASCLIMIT 10;

Validation confirms: it's a SELECT, it has a LIMIT, it only references allowed tables. The query runs on the read-only replica. The result goes back to the model, and the user sees: "Here are the 10 customers with the biggest drop. The largest: Kowalski Ltd — down PLN 142,000 (−38%). Want to see the causes broken down by product category?".

The employee wrote no SQL. Didn't wait for an analyst. And — crucially — can immediately ask a follow-up.

Where text-to-SQL genuinely delivers

Ad-hoc business analytics. Questions that come up once and have no pre-built report: "which rep has the most dead leads?", "what's the average margin on category X this quarter?".
Self-service for management and non-technical teams. Leadership, sales, finance, marketing — they ask directly, no IT queue.
A layer over ERP/CRM. A conversational front-end on the database under Optima, Subiekt or SAP — asks for data the standard reports don't show.
First line of data work. Before anyone engages an analyst, they self-check 80% of questions in seconds.

What text-to-SQL still doesn't do well — honestly

Very complex analytical queries. Multi-level CTEs, time windows, advanced statistics — accuracy drops here and a human or a tested, pre-built report is still better.
A dirty, inconsistent schema. If the database has cryptic names, no foreign keys, and data in three places at once, the model will make mistakes. A semantic layer helps, but that's work to do, not magic.
Ambiguous questions. "Show the best customers" — best by what? Revenue, margin, frequency? A good system asks back; a weak one guesses.
Critical decisions without verification. A plausible-looking but wrong answer is a real risk. For high-stakes decisions, always show the generated SQL and result for approval.

These aren't reasons to give up — they're reasons to build with architecture, not a naive prompt.

What it costs

Component	Estimate	Notes
System build (one-off)	PLN 8,000–30,000	Depends on table count, schema quality and semantic-layer scope
Semantic layer / glossary	Part of the build	The bulk of the work on a complex, inconsistent schema
LLM API cost (monthly)	PLN 100–500	Depends on question volume; open-source models cut this
Infrastructure (read-only replica)	PLN 0–300/mo	An existing database replica is often enough
Maintenance	Low	Updates on schema changes and new metrics

For comparison: a full-time analyst writing ad-hoc reports costs tens of thousands of zlotys a month. Text-to-SQL doesn't replace an analyst on complex projects — but it takes 80% of the repetitive, simple questions off their plate, the ones that block their time today.

My approach to deployment

When I build a text-to-SQL system for a company, the order is always the same and it starts with safety, not the model:

1.Read-only replica and permissions — first I cut off any possibility of writes. That's the foundation, not an add-on.
2.Semantic layer — I define the key business terms with the client: what an "active customer" is, how we calculate "margin", what "order fulfilled" means. This is where accuracy is born.
3.Few-shot from real queries — I build a library of proven examples from the company's history.
4.Validation and self-correction — a loop that catches errors before they reach the user.
5.Audit and human-in-the-loop — log everything, plus an option to preview the SQL on sensitive questions.

The result: management and non-technical teams ask for data directly, and IT stops being the bottleneck on every question. If you want to check whether your database is fit for text-to-SQL and what it would actually give your company — book a consultation: I review the schema, measure accuracy on your real questions, and show what can be built.

FAQ — Text-to-SQL

/// RELATED_RECORDS

AI & Automation

EU AI Grants for Polish SMEs 2026 — Up to PLN 3M. Which Programme, How to Apply

In 2026, Polish SMEs have access to nearly PLN 1.3 billion in EU grants for AI and automation. The problem: most companies find out about an open call after it closes. I explain the five main programmes (PARP, FENG, KPO, FEPW, BUR), what qualifies as an eligible expense (servers, GPUs, training, audits, implementation), how the application works, and what the real conditions are — without bureaucratic jargon.

11 min

AI & Automation

Computer Use — AI That Operates Any App Like a Human (No API, No Integration Required)

Your 2003 legacy ERP, a government portal with no API, an old booking system that only runs on Internet Explorer — AI can now operate all of it like a human: clicking, filling forms, reading the screen and making decisions. Computer Use is the biggest shift in automation since RPA. Here's how it works, when it makes more sense than API/n8n, what it costs, and why it replaces an entire class of $200K+ tooling.

12 min

AI & Automation

How AI Reads Invoices from Email and Enters Them into ERP

AI can automatically read an invoice from an email attachment — PDF, scan, or phone photo — and enter the data directly into an ERP system without any manual retyping. Full automation of cost invoice processing: from the mailbox to accounting.

10 min

/// AUTHOR

Paweł Wiszniewski

SEO & GEO Specialist & AI Engineer

SEO/GEO specialist (10 years) and AI engineer (3 years). I build search visibility, AI systems and automations that reduce costs and improve operational efficiency.

LinkedIn Facebook

Signal received?

Terminate
Silence

Initiate protocol. Establish connection. Let's build something loud.

> WAITING_FOR_INPUT...

BIAŁYSTOK, PL

+48 732 022 086 pawel.wiszniewski95@gmail.com

The problem it actually solves

How it works under the hood — the question-to-answer loop

Why the naive approach fails — and what actually drives accuracy

Safety — the most important chapter everyone skips

What it looks like in practice — an example

Where text-to-SQL genuinely delivers

What text-to-SQL still doesn't do well — honestly

What it costs

My approach to deployment

FAQ — Text-to-SQL

Related Articles

/// RELATED_RECORDS

EU AI Grants for Polish SMEs 2026 — Up to PLN 3M. Which Programme, How to Apply

Computer Use — AI That Operates Any App Like a Human (No API, No Integration Required)

How AI Reads Invoices from Email and Enters Them into ERP

Signal received?

TerminateSilence

Terminate
Silence