Read the metrics

Track 8 read result.plan. Track 9 read result.trace. This track reads result.metrics: what the run cost. The numbers split cleanly in two. Planning (the one model call that wrote the plan) took almost all of the time and all of the tokens. Executing the plan, your primitives running in plain Python, took almost no time and no tokens.

Same agent as Track 9#

The same shopping cart: new_cart, add_item, and cart_total. Nothing about it changes. This track just asks a different question of the result it returns.

Read the metrics#

Run the cart, then read the fields off result.metrics.

python

# main.py
result = agent.run(QUERY)

m = result.metrics  # ExecutionMetrics

print("provider:", m.provider)
print("model:", m.model)
print("steps executed:", m.steps_executed)
print("plan tokens:", m.plan_tokens)
print("  input:", m.plan_tokens.input_tokens)
print("  output:", m.plan_tokens.output_tokens)
print("  total:", m.plan_tokens.total_tokens)
print("plan time (s):", round(m.plan_time_seconds, 4))
print("execute time (s):", round(m.execute_time_seconds, 4))
print("total time (s):", round(m.total_time_seconds, 4))

bash

uv run main.py

Sample output:

text

provider: ollama
model: qwen2.5-coder:7b
steps executed: 5
plan tokens: input_tokens=396 output_tokens=63
  input: 396
  output: 63
  total: 459
plan time (s): 1.5179
execute time (s): 0.0047
total time (s): 1.5226

These exact numbers will not be yours. Token counts and timings depend on the model, the provider, the machine, and the wording of the task. Read them for shape and proportion, not as fixed values.

What you're looking at#

Look at the two times. plan_time_seconds is 1.5179; execute_time_seconds is 0.0047. The model spent about a second and a half writing the plan. Running that plan took under five thousandths of a second. Planning was about 99.7% of the wall clock; your five primitive calls were the rounding error.

That gap shows the library's model in one number. Thinking is slow and happens once, up front, in the model. Doing is fast and happens in your process, in plain Python you wrote and tested.

A note on TokenUsage#

plan_tokens is a TokenUsage object with input_tokens, output_tokens, and total_tokens. The input tokens are the prompt the model read: your primitive signatures and the task. The output tokens are the plan it wrote back.

The field is called plan_tokens, not total_tokens, because executing the plan cost zero tokens. Nothing went back to the model: the cart was built and totalled in your process. There is no execute_tokens because it would always be zero.