Read the metrics
What a run cost in time and tokens, in result.metrics. Planning is slow and uses tokens. Executing is fast and uses none.
Before you start
Track 8 read result.plan. Track 9 read result.trace. This track reads
result.metrics: what the run cost. The numbers split cleanly in two. Planning (the one model call that wrote the
plan) took almost all of the time and all of the tokens. Executing the plan,
your primitives running in plain Python, took almost no time and no tokens.
Same agent as Track 9#
The same shopping cart: new_cart, add_item, and cart_total. Nothing about
it changes. This track just asks a different question of the result it returns.
Read the metrics#
Run the cart, then read the fields off result.metrics.
# main.py
result = agent.run(QUERY)
m = result.metrics # ExecutionMetrics
print("provider:", m.provider)
print("model:", m.model)
print("steps executed:", m.steps_executed)
print("plan tokens:", m.plan_tokens)
print(" input:", m.plan_tokens.input_tokens)
print(" output:", m.plan_tokens.output_tokens)
print(" total:", m.plan_tokens.total_tokens)
print("plan time (s):", round(m.plan_time_seconds, 4))
print("execute time (s):", round(m.execute_time_seconds, 4))
print("total time (s):", round(m.total_time_seconds, 4))uv run main.pySample output:
provider: ollama
model: qwen2.5-coder:7b
steps executed: 5
plan tokens: input_tokens=396 output_tokens=63
input: 396
output: 63
total: 459
plan time (s): 1.5179
execute time (s): 0.0047
total time (s): 1.5226These exact numbers will not be yours. Token counts and timings depend on the model, the provider, the machine, and the wording of the task. Read them for shape and proportion, not as fixed values.
What you're looking at#
Look at the two times. plan_time_seconds is 1.5179; execute_time_seconds is
0.0047. The model spent about a second and a half writing the plan. Running that
plan took under five thousandths of a second. Planning was about 99.7% of the
wall clock; your five primitive calls were the rounding error.
That gap shows the library's model in one number. Thinking is slow and happens once, up front, in the model. Doing is fast and happens in your process, in plain Python you wrote and tested.
A note on TokenUsage#
plan_tokens is a TokenUsage object with input_tokens, output_tokens, and
total_tokens. The input tokens are the prompt the model read: your primitive
signatures and the task. The output tokens are the plan it wrote back.
The field is called plan_tokens, not total_tokens, because executing the
plan cost zero tokens. Nothing went back to the model: the cart was built and
totalled in your process. There is no execute_tokens because it would always
be zero.