Natural language to SQL
An agent translates a plain-English question into a SQL query and runs it against a real database. The full schema is injected into the task so the agent writes correct SQL in one shot.
Before you start
Ask a business question in plain English, get a formatted table back. No hand-written SQL required. The agent translates the question into a query, runs it against the Chinook music store database, and returns the results.
Why not just ask the LLM?#
A language model can write SQL, but it does not know your schema. Without table names, column names, and foreign key relationships, the generated query is a guess. It may compile but return wrong results, or fail entirely.
The fix is simple: give the schema to the agent up front. full_schema()
produces a compact one-line-per-table format:
albums(AlbumId, Title, ArtistId -> artists.ArtistId)
artists(ArtistId, Name)
customers(CustomerId, FirstName, ..., SupportRepId -> employees.EmployeeId)
invoice_items(InvoiceLineId, InvoiceId -> invoices.InvoiceId, TrackId -> tracks.TrackId, UnitPrice, Quantity)The -> notation shows foreign keys inline so the LLM can write JOINs
without guessing which column references which table. With the full schema
in the prompt, the agent writes correct SQL on the first attempt for every
question in this tutorial.
The agent#
One primitive.
# sql_agent.py
class SQLAgent(PlanExecute):
def __init__(self, db_path: str, llm) -> None:
super().__init__(llm=llm)
self._conn = sqlite3.connect(db_path)
@primitive(read_only=True)
def run_query(self, sql: str) -> str:
"""Execute a SQL SELECT query and return the results as a formatted table."""
try:
cursor = self._conn.execute(sql)
except sqlite3.Error as e:
return f"SQL error: {e}"
rows = cursor.fetchmany(MAX_ROWS)
if not rows:
return "(no results)"
headers = [d[0] for d in cursor.description]
# format as aligned columns, truncated at COL_WIDTH
...run_query catches sqlite3.Error and returns the error message as a
string rather than raising. If the agent writes broken SQL, the error
appears as the plan result and main.py can display it cleanly.
main.py loads the schema once and prepends it to every question:
agent = SQLAgent(db_path=DB_PATH, llm=llm)
schema = agent.full_schema()
for question in QUESTIONS:
task = f"Database schema:\n{schema}\n\nQuestion: {question}"
result = agent.run(task)The plan for each question is two lines:
result = run_query("SELECT ...")
return resultRun it#
uv add opensymbolicai-core
ollama pull qwen2.5-coder:7b
uv run main.pyThe Chinook database (~860 KB) is downloaded automatically on first run.
Sample output#
Model : qwen2.5-coder:7b
Database: chinook.db
============================================================
Q: Who are the top 5 artists by total sales revenue?
Attempts: 1 (3.8s)
Results:
Name TotalSales
------------ ----------
Iron Maiden 138.6
U2 105.93
Metallica 90.09
Led Zeppelin 86.13
Lost 81.59
Q: List the top 3 genres by number of tracks.
Attempts: 1 (1.8s)
Results:
Name TrackCount
----- ----------
Rock 1297
Latin 579
Metal 374
Q: What are the 3 best-selling albums?
Attempts: 1 (3.7s)
Results:
Title TotalSales
-------------------- ----------
Battlestar Galactica 35.82
The Office, Season 3 31.84
Minha Historia 26.73
Q: Which employee supports the most customers?
Attempts: 1 (2.0s)
Results:
FirstName LastName SupportCount
--------- -------- ------------
Jane Peacock 21What to notice#
One primitive is enough. The LLM's job is to write SQL; Python's job is
to run it. There is no generate_sql primitive separate from run_query
because there is no reason to separate them. The plan that calls
run_query("SELECT ...") passes the SQL directly as an argument; the
primitive never needs to receive it from a prior step.
Schema in the task, not the system prompt. The schema is prepended to
each question string before calling agent.run(). This means the agent
can be reused against a different database by loading a different schema
and passing it the same way. Nothing about the agent class is
database-specific.
run_query returns errors as strings. If the generated SQL has a
syntax error or references a nonexistent column, sqlite3.Error is caught
and the message is returned as a plain string. The plan result is the error
message, which becomes the final answer. No exception propagates to the
caller; the agent reports what went wrong.