Chat frontend over a SQL database

A browser chat UI over a real database. Type a question, see the answer as a table. Ask a follow-up, and the backend rephrases it into a standalone question using conversation history before running the agent.

This tutorial wraps the NL-to-SQL agent from Track 46 in a FastAPI server and a single HTML page with no build step.

How it works#

Three things happen on every POST to /chat:

Rephrase -- if there is history, the LLM rewrites the message as a standalone question that can be understood without the conversation.
Run the agent -- schema is prepended to the (possibly rephrased) question, the SQLAgent generates a plan and runs the query.
Return -- answer table, the SQL that produced it, and the rephrased question (when rephrasing happened).

The backend retries once if the agent returns a SQL error, which covers occasional column or table name mistakes from the model.

Follow-up rephrasing#

python

# main.py
def rephrase(question: str, history: list[Message]) -> str | None:
    if not history:
        return None
    turns = "\n".join(
        f"{m.role.capitalize()}: {m.content}" for m in history[-6:]
    )
    prompt = (
        f"Conversation so far:\n{turns}\n\n"
        f"Follow-up question: {question}\n\n"
        "Rewrite the follow-up as a complete, standalone question that can be "
        "understood without the conversation history. "
        "If it is already standalone, output it unchanged. "
        "Output only the rewritten question, nothing else."
    )
    result = _llm.generate(prompt).text.strip()
    return result if result.lower() != question.lower() else None

If the rewritten question is identical to the original, rephrase returns None and the frontend shows no rephrased label.

The agent#

The same one-primitive pattern as Track 46. The full schema is prepended once on startup and included with every question:

python

# main.py
@app.on_event("startup")
async def startup() -> None:
    _agent  = SQLAgent(db_path=DB_PATH, llm=_llm)
    _schema = _agent.full_schema()

@app.post("/chat", response_model=ChatResponse)
async def chat(req: ChatRequest) -> ChatResponse:
    rephrased = rephrase(req.message, req.history)
    question  = rephrased or req.message

    task   = f"Database schema:\n{_schema}\n\nQuestion: {question}"
    result = _agent.run(task)
    answer = result.result if isinstance(result.result, str) else "(no result)"

    if answer.startswith("SQL error"):
        result = _agent.run(task)
        answer = result.result if isinstance(result.result, str) else "(no result)"

    sql = extract_sql(result.plan or "")
    return ChatResponse(answer=answer, sql=sql, rephrased=rephrased)

API#

text

POST /chat
{
  "message": "...",
  "history": [{"role": "user"|"assistant", "content": "..."}]
}

→ { "answer": "...", "sql": "...", "rephrased": "..." | null }

Frontend#

A single chat.html file served directly by FastAPI. No build step, no framework. Each response shows the answer table, a collapsible "View SQL" block with the generated query, and a "Rephrased as" line when the question was rewritten.

Run it#

bash

uv add opensymbolicai-core fastapi uvicorn
ollama pull qwen2.5-coder:7b
uv run uvicorn main:app --host 127.0.0.1 --port 8048

Open http://127.0.0.1:8048. The Chinook database (~860 KB) is downloaded automatically on first run.

Sample conversation#

text

You:  Which employee supports the most customers?

AI:   FirstName  LastName  SupportCount
      ---------  --------  ------------
      Jane       Peacock   21

      View SQL ▸
        SELECT e.FirstName, e.LastName, COUNT(c.CustomerId) AS SupportCount
        FROM employees e
        JOIN customers c ON e.EmployeeId = c.SupportRepId
        GROUP BY e.EmployeeId
        ORDER BY SupportCount DESC
        LIMIT 1

You:  What country are most of her customers from?

AI:   ↳ Rephrased as: "What country are most of Jane Peacock's customers from?"

      Country  CustomerCount
      -------  -------------
      USA      13

      View SQL ▸
        SELECT c.Country, COUNT(c.CustomerId) AS CustomerCount
        FROM customers c
        JOIN employees e ON c.SupportRepId = e.EmployeeId
        WHERE e.FirstName = 'Jane' AND e.LastName = 'Peacock'
        GROUP BY c.Country
        ORDER BY CustomerCount DESC
        LIMIT 5

What to notice#

Rephrasing makes the agent stateless. The agent receives one self-contained task string on each call. It has no memory of previous questions. Conversation context lives in the browser and is sent with every request. The backend converts it to a standalone question before the agent sees it.

extract_sql pulls the query from the plan. The agent's plan string contains the SQL as an argument to run_query(...). A regex finds the SELECT statement and trims it at the first closing quote or return keyword. This is fragile for complex SQL with string literals containing keywords, which is why re.DOTALL and careful stop-marker trimming are used.

One retry for SQL errors. When run_query returns "SQL error: ...", main.py runs the agent a second time with the same task. The second attempt usually succeeds because the model can generate slightly different SQL on a new call. Two attempts is enough for the questions in this tutorial.

The server is stateless between requests. _agent, _llm, and _schema are module-level globals initialised once at startup. Each /chat call reads them but writes nothing. Multiple users can query concurrently without interfering with each other.