Coding agent

An agent that reads a Python file, asks the LLM to rewrite it, saves the result, and runs it to confirm the output matches. Three recursive programs are converted to iterative: Fibonacci, factorial, and binary search.

Why not just prompt the LLM?#

Pasting source code into a chat and asking for a rewrite works once. It breaks down when you have many files, want the result saved automatically, or need confidence the rewrite still produces the correct output.

An agent with read_file, rewrite, write_file, and run_code turns the same operation into a repeatable plan. The LLM does the transformation; Python verifies it.

The agent#

Four primitives cover the full read-transform-write-validate cycle:

python

# coding_agent.py
class CodingAgent(PlanExecute):

    @primitive(read_only=True)
    def read_file(self, path: str) -> str:
        """Return the contents of a Python source file."""
        with open(path, encoding="utf-8") as f:
            return f.read()

    @primitive(read_only=True)
    def rewrite(self, source: str, instruction: str) -> str:
        """Apply instruction to source and return the new Python source code."""
        prompt = (
            f"{instruction}\n\n"
            "Return ONLY the new Python source code, no explanation, no markdown fences.\n\n"
            f"Source:\n{source}"
        )
        raw = self._llm.generate(prompt).text.strip()
        # Strip markdown fences if the model adds them anyway
        if raw.startswith("```"):
            raw = raw.split("\n", 1)[-1]
        if raw.endswith("```"):
            raw = raw.rsplit("```", 1)[0]
        return raw.strip()

    @primitive(read_only=False)
    def write_file(self, path: str, content: str) -> str:
        """Write content to path."""
        with open(path, "w", encoding="utf-8") as f:
            f.write(content)
        return f"wrote {path}"

    @primitive(read_only=True)
    def run_code(self, path: str) -> str:
        """Run a Python file and return stdout, or stderr on failure."""
        result = subprocess.run(
            [sys.executable, path],
            capture_output=True, text=True, timeout=10,
        )
        if result.returncode != 0:
            return f"ERROR:\n{result.stderr.strip()}"
        return result.stdout.strip()

    @primitive(read_only=True)
    def respond(self, message: str) -> str:
        """Return message as the final response."""
        return message

The generated plan is the same shape for every task:

python

source       = read_file("programs/fib.py")
updated      = rewrite(source, "Convert the recursive function to iterative...")
write_result = write_file("programs/fib.py", updated)
output       = run_code("programs/fib.py")
return respond(output)

Run it#

bash

uv add opensymbolicai-core
ollama pull qwen2.5-coder:7b
uv run main.py

Sample output#

text

Model: qwen2.5-coder:7b
============================================================

programs/fib.py
----------------------------------------
Diff:
--- a/programs/fib.py
+++ b/programs/fib.py
@@ -1,5 +1,8 @@
 def fib(n):
     if n <= 1:
         return n
-    return fib(n - 1) + fib(n - 2)
+    a, b = 0, 1
+    for _ in range(2, n + 1):
+        a, b = b, a + b
+    return b

Output (8.8s):
  0 1 1 2 3 5 8 13 21 34

programs/factorial.py
----------------------------------------
Diff:
--- a/programs/factorial.py
+++ b/programs/factorial.py
@@ -1,5 +1,5 @@
 def factorial(n):
-    if n == 0:
-        return 1
-    return n * factorial(n - 1)
+    result = 1
+    for i in range(2, n + 1):
+        result *= i
+    return result

Output (3.9s):
  0! = 1
  1! = 1
  2! = 2
  ...
  7! = 5040

programs/binary_search.py
----------------------------------------
Diff:
--- a/programs/binary_search.py
+++ b/programs/binary_search.py
@@ -1,13 +1,13 @@
-def binary_search(arr, target, low=0, high=None):
-    if high is None:
-        high = len(arr) - 1
-    if low > high:
-        return -1
-    mid = (low + high) // 2
-    if arr[mid] == target:
-        return mid
-    elif arr[mid] < target:
-        return binary_search(arr, target, mid + 1, high)
-    else:
-        return binary_search(arr, target, low, mid - 1)
+def binary_search(arr, target):
+    low, high = 0, len(arr) - 1
+    while low <= high:
+        mid = (low + high) // 2
+        if arr[mid] == target:
+            return mid
+        elif arr[mid] < target:
+            low = mid + 1
+        else:
+            high = mid - 1
+    return -1

Output (6.0s):
  Array: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
  Search 6: index 3
  Search 7: index -1
  Search 14: index 7

What to notice#

write_file is the only mutating primitive. It is marked read_only=False. The other three primitives only read or compute. This makes it easy to audit what the agent can change: search for read_only=False in the class and you have the complete list of side effects.

run_code closes the loop. The agent executes its own output with the same Python interpreter. If write_file wrote broken code, run_code returns the stderr and that string becomes the final answer, making the failure visible. There is no separate test harness; the program's own print statements are the assertion.

The diff is computed outside the agent. main.py reads the file before and after agent.run() and calls difflib.unified_diff. The agent does not see or produce the diff. This is a useful separation: the agent is responsible for the transformation; the caller is responsible for recording what changed.

rewrite strips markdown fences defensively. The prompt says to return only code, but some model versions add fences anyway. Stripping them in the primitive means the LLM's formatting habits do not break write_file. Put cleanup logic in the primitive that needs clean input, not in the plan.

The same four primitives handle any source-level transformation. Add type hints, rename a function, convert a class to a dataclass, translate between languages. The instruction passed to rewrite is the only thing that changes.