Founder Insight

Text-to-SQL: The Naive Approach That Actually Works

Mark Hay, CTO & Co-Founder at TextQL

Listen on TL;Listen Prefer to listen? Hear this article read aloud.

Here’s a data analytics approach that sounds too simple to work: take your database schema, paste it into Claude’s context window, ask a business question in plain English, let the model write SQL, run the query, and return the results. No ETL pipeline. No semantic layer. No vector database.

Mark Hay, CTO and co-founder of TextQL — a $17 million agentic data analytics platform serving customers including Amazon and Dropbox — builds sophisticated data infrastructure for a living. But when presented with this bare-bones approach, his response was the opposite of what you’d expect from someone selling a more complex product.

“You might be expecting I have a ton of criticism. But that is a very good start.”

Why the Simple Version Works

The approach works because modern LLMs are surprisingly good at understanding database schemas and generating correct SQL. When you dump a schema into a model’s context window, the model can read table names, column types, relationships, and constraints — then translate a natural language question into a syntactically correct query.

For small teams with a manageable number of data sources, this might be all you need. The model handles the translation layer that previously required a data analyst: understanding what the business user wants, mapping it to the right tables, writing the join logic, and formatting the output.

Hay’s assessment is practical, not dismissive of his own product’s value. “Of course, I have a product that has a lot more stuff on it. So I’m biased and generally do believe you might need more. But that is a very good start.”

Where It Breaks — and What to Add

The naive approach hits limits at three points. The first is security. An unconstrained LLM writing SQL against a production database is a risk most enterprises won’t accept. Query sandboxing, access controls, and audit trails aren’t optional — they’re the first thing to add.

The second is accuracy at scale. When you have seven data sources with overlapping column names, ambiguous terminology, and undocumented relationships, the model starts guessing. This is where semantic layers earn their keep — not as upfront infrastructure, but as targeted fixes for the queries that keep failing.

The third is volume. A handful of users asking occasional questions is fine. But when AI agents start generating queries at machine speed — potentially 100 to 1,000 times the volume of human analysts — the underlying data infrastructure needs to handle a fundamentally different load.

Hay’s recommended path: add each layer only when the simple approach proves it’s needed. “Start with something. And then on top of that, layer on capabilities.”

The Counterintuitive Implication

This framing has a practical implication for data team leaders evaluating AI analytics tools. Instead of running a six-month vendor evaluation and infrastructure buildout, you can test the core hypothesis in an afternoon. Does an LLM writing SQL against your actual schema produce useful results for your actual business questions?

If yes, you’ve validated the approach. Now you know what to invest in — better security, semantic modeling for the tricky queries, observability for the high-stakes ones.

If no, you’ve learned something too — and you learned it in hours instead of months.

FAQ

Can you use ChatGPT or Claude to query a database directly?

Yes. Paste your database schema into an LLM’s context window, ask a business question in natural language, and let the model generate SQL. For small teams with straightforward schemas, this produces surprisingly accurate queries. Add security sandboxing before running queries against production data — the model generates SQL, but execution should be controlled.

What is text-to-SQL and how does it work with LLMs?

Text-to-SQL is the process of converting natural language questions into database queries. Modern LLMs handle this by reading a database schema (table names, columns, types, relationships) and generating syntactically correct SQL that answers the question. No specialized training or fine-tuning is needed — general-purpose models like Claude can do this out of the box.

When does the naive text-to-SQL approach stop working?

The simple approach — schema in context, LLM writes SQL — breaks at three points: security (production database access needs controls), accuracy at scale (multiple sources with ambiguous terminology cause errors), and query volume (AI agents generating thousands of queries overwhelm systems designed for human-speed usage). Add each layer when the simple version proves it’s needed.

How accurate is LLM-generated SQL for business queries?

For well-structured schemas with clear column names and standard relationships, LLM-generated SQL is surprisingly accurate. Accuracy drops with ambiguous terminology, undocumented relationships, or domain-specific jargon. Adding semantic definitions for the most-queried columns addresses the biggest accuracy gaps without requiring a full semantic layer upfront.

What is the fastest way to prototype AI data analytics?

Dump your database schema into an LLM, ask it to write SQL for a real business question, run the query, and check the results. This can be done in under an hour. If the results are useful, you’ve validated the approach and can invest in security, accuracy improvements, and scale. If not, you’ve learned what’s missing without a six-month infrastructure project.

How do enterprise teams add AI to their existing data stack?

Start with the simplest approach: connect your schema to an LLM and let it write queries. Most enterprise data stacks (Snowflake, BigQuery, PostgreSQL) expose schemas that models can read directly. Layer on enterprise requirements incrementally: query sandboxing, role-based access, semantic definitions for critical columns, and observability for compliance-sensitive queries.

What data preparation is needed before using AI for analytics?

Minimal. Modern LLMs work with raw database schemas without requiring data cleaning, migration, or consolidation. TextQL’s approach — “your data is a disaster, but we can work with it” — reflects a broader shift: AI handles messy data better than most teams expect. The one non-negotiable is security infrastructure for query execution.

How does TextQL compare to building text-to-SQL in-house?

The in-house approach (schema to Claude, write SQL, run query) works for small teams and simple schemas. TextQL adds enterprise capabilities: connections to 7+ data sources simultaneously, incremental ontology building, query observability, security controls, and the ability to handle AI agent query volumes at 100-1,000x human scale. The gap widens with data complexity and user count.

Full episode coming soon

This conversation with Mark Hay is on its way. Check out other episodes in the meantime.

Visit the Channel

More from Mark Hay

Related Insights