Chat frontend over a SQL database
A FastAPI backend serves a browser chat UI. Questions become SQL via an NL-to-SQL agent, follow-ups are rephrased using conversation history, and the generated SQL is shown alongside the answer.
Before you start
A browser chat UI over a real database. Type a question, see the answer as a table. Ask a follow-up, and the backend rephrases it into a standalone question using conversation history before running the agent.
This tutorial wraps the NL-to-SQL agent from Track 46 in a FastAPI server and a single HTML page with no build step.
How it works#
Three things happen on every POST to /chat:
- Rephrase -- if there is history, the LLM rewrites the message as a standalone question that can be understood without the conversation.
- Run the agent -- schema is prepended to the (possibly rephrased)
question, the
SQLAgentgenerates a plan and runs the query. - Return -- answer table, the SQL that produced it, and the rephrased question (when rephrasing happened).
The backend retries once if the agent returns a SQL error, which covers occasional column or table name mistakes from the model.
Follow-up rephrasing#
# main.py
def rephrase(question: str, history: list[Message]) -> str | None:
if not history:
return None
turns = "\n".join(
f"{m.role.capitalize()}: {m.content}" for m in history[-6:]
)
prompt = (
f"Conversation so far:\n{turns}\n\n"
f"Follow-up question: {question}\n\n"
"Rewrite the follow-up as a complete, standalone question that can be "
"understood without the conversation history. "
"If it is already standalone, output it unchanged. "
"Output only the rewritten question, nothing else."
)
result = _llm.generate(prompt).text.strip()
return result if result.lower() != question.lower() else NoneIf the rewritten question is identical to the original, rephrase returns
None and the frontend shows no rephrased label.
The agent#
The same one-primitive pattern as Track 46. The full schema is prepended once on startup and included with every question:
# main.py
@app.on_event("startup")
async def startup() -> None:
_agent = SQLAgent(db_path=DB_PATH, llm=_llm)
_schema = _agent.full_schema()
@app.post("/chat", response_model=ChatResponse)
async def chat(req: ChatRequest) -> ChatResponse:
rephrased = rephrase(req.message, req.history)
question = rephrased or req.message
task = f"Database schema:\n{_schema}\n\nQuestion: {question}"
result = _agent.run(task)
answer = result.result if isinstance(result.result, str) else "(no result)"
if answer.startswith("SQL error"):
result = _agent.run(task)
answer = result.result if isinstance(result.result, str) else "(no result)"
sql = extract_sql(result.plan or "")
return ChatResponse(answer=answer, sql=sql, rephrased=rephrased)API#
POST /chat
{
"message": "...",
"history": [{"role": "user"|"assistant", "content": "..."}]
}
→ { "answer": "...", "sql": "...", "rephrased": "..." | null }Frontend#
A single chat.html file served directly by FastAPI. No build step, no
framework. Each response shows the answer table, a collapsible "View SQL"
block with the generated query, and a "Rephrased as" line when the question
was rewritten.
Run it#
uv add opensymbolicai-core fastapi uvicorn
ollama pull qwen2.5-coder:7b
uv run uvicorn main:app --host 127.0.0.1 --port 8048Open http://127.0.0.1:8048. The Chinook database (~860 KB) is downloaded
automatically on first run.
Sample conversation#
You: Which employee supports the most customers?
AI: FirstName LastName SupportCount
--------- -------- ------------
Jane Peacock 21
View SQL ▸
SELECT e.FirstName, e.LastName, COUNT(c.CustomerId) AS SupportCount
FROM employees e
JOIN customers c ON e.EmployeeId = c.SupportRepId
GROUP BY e.EmployeeId
ORDER BY SupportCount DESC
LIMIT 1
You: What country are most of her customers from?
AI: ↳ Rephrased as: "What country are most of Jane Peacock's customers from?"
Country CustomerCount
------- -------------
USA 13
View SQL ▸
SELECT c.Country, COUNT(c.CustomerId) AS CustomerCount
FROM customers c
JOIN employees e ON c.SupportRepId = e.EmployeeId
WHERE e.FirstName = 'Jane' AND e.LastName = 'Peacock'
GROUP BY c.Country
ORDER BY CustomerCount DESC
LIMIT 5What to notice#
Rephrasing makes the agent stateless. The agent receives one self-contained task string on each call. It has no memory of previous questions. Conversation context lives in the browser and is sent with every request. The backend converts it to a standalone question before the agent sees it.
extract_sql pulls the query from the plan. The agent's plan string
contains the SQL as an argument to run_query(...). A regex finds the
SELECT statement and trims it at the first closing quote or return
keyword. This is fragile for complex SQL with string literals containing
keywords, which is why re.DOTALL and careful stop-marker trimming are used.
One retry for SQL errors. When run_query returns "SQL error: ...",
main.py runs the agent a second time with the same task. The second
attempt usually succeeds because the model can generate slightly different
SQL on a new call. Two attempts is enough for the questions in this tutorial.
The server is stateless between requests. _agent, _llm, and _schema
are module-level globals initialised once at startup. Each /chat call reads
them but writes nothing. Multiple users can query concurrently without
interfering with each other.