Stopping a batch when the token budget runs out

Track 28 showed how to read token counts from a single result. The one new thing here: a thin wrapper that accumulates those counts across a batch and stops before starting a task it cannot afford.

The wrapper#

python

MIN_PLANNING_TOKENS = 200


class BudgetExceeded(Exception):
    pass


class BudgetedRunner:

    def __init__(self, agent, budget: int) -> None:
        self._agent = agent
        self._budget = budget
        self._used = 0

    @property
    def tokens_remaining(self) -> int:
        return max(0, self._budget - self._used)

    def run(self, task: str):
        if self.tokens_remaining < MIN_PLANNING_TOKENS:
            raise BudgetExceeded(
                f"{self.tokens_remaining} tokens remaining, "
                f"need at least {MIN_PLANNING_TOKENS} to plan"
            )
        result = self._agent.run(task)
        self._used += result.metrics.plan_tokens.total_tokens
        return result

The guard fires before the task runs. A blocked task costs zero tokens. MIN_PLANNING_TOKENS is a floor: set it below your typical input token count so you do not start a task that is likely to fail mid-plan.

Run six tasks against a 1,000-token budget#

python

# main.py
from calc import Calc
from opensymbolicai.llm import LLMConfig

TASKS = [
    "What is 7 + 3?",
    "What is 12 * 15 - 47?",
    "What is 8 factorial?",
    "What is the 10th Fibonacci number?",
    "What is 6 factorial plus the 8th Fibonacci number?",
    "What is (factorial of 5) divided by (fibonacci of 6), then add 12?",
]

llm = LLMConfig(provider="ollama", model="qwen2.5-coder:7b")
runner = BudgetedRunner(Calc(llm=llm), budget=1000)

for task in TASKS:
    try:
        result = runner.run(task)
        print(f"  ok  {task}")
        print(f"      result={result.result}  used={runner._used}  remaining={runner.tokens_remaining}")
    except BudgetExceeded as e:
        print(f"  --  {task}")
        print(f"      BudgetExceeded: {e}")
        break

bash

uv run main.py

Output:

text

Budget: 1000 tokens

  ok  What is 7 + 3?
      result=10  used=442  remaining=558

  ok  What is 12 * 15 - 47?
      result=133  used=902  remaining=98

  --  What is 8 factorial?
      BudgetExceeded: 98 tokens remaining, need at least 200 to plan

  Stopping -- 3 task(s) skipped.

What to notice#

The guard checks before, not after. Task 3 is blocked even though it might have fit: 98 tokens remain and the typical input is ~420. The guard does not guess whether a task will succeed; it just enforces the floor.
A blocked task costs nothing. The model is never called for task 3. The budget stays at 902 after the exception.
The wrapper needs no framework changes. result.metrics.plan_tokens.total_tokens is the only hook into the framework. The rest is plain Python.