Illustration Part 2: Token Economics
Tool-calling re-reads everything on every call. Here's exactly where the tokens go.
In Part 1, we saw that tool-calling agents lose track of their instructions as context grows. Now let's see exactly where the tokens go.
Where Tokens Go#
Behaviour Programming#
┌─────────────────────────────────────────────────────────────┐
│ PLANNING CALL → ~1,800 input tokens │
│ (Sees: primitives + decomposition examples + query) │
│ (Outputs: executable plan) │
└─────────────────────────────────────────────────────────────┘
↓ Python executes plan
┌─────────────────────────────────────────────────────────────┐
│ EXECUTION CALL (extract_answer) → ~1,000 input tokens │
│ (Sees: context + question only) │
│ (Outputs: answer) │
└─────────────────────────────────────────────────────────────┘
TOTAL: ~2,800 tokensTool-Calling#
┌─────────────────────────────────────────────────────────────┐
│ CALL 1: "What tool first?" → 1,200 tokens │
│ (Sees: full instructions + query) │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ CALL 2: "Got docs, now what?" → 4,500 tokens │
│ (Sees: instructions + query + retrieved docs) │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ CALL 3: "Got answer, done?" → 5,800 tokens │
│ (Sees: instructions + query + docs + previous answer) │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ CALL 4: "Really done?" → 6,200 tokens │
│ (Sees: everything above + more) │
└─────────────────────────────────────────────────────────────┘
... continues looping ...
TOTAL: ~22,000+ tokensThe tool-calling approach re-reads the instruction prompt (~1,000 tokens) on every single call. It re-reads all retrieved documents on every call after retrieval. The context accumulates linearly with each step.
Per-Query Breakdown#
| Query | Behaviour Tokens | Tool-Call Tokens | Ratio |
|---|---|---|---|
| "What is machine learning?" | 2,862 | 22,895 | 8.0x |
| "What is artificial intelligence?" | 3,046 | 7,006 | 2.3x |
| "How do supervised and unsupervised learning differ?" | 5,295 | 13,778 | 2.6x |
| "What is deep learning...?" | 5,625 | 8,865 | 1.6x |
The variance in tool-calling is itself a problem. Sometimes it uses 2.3x more tokens, sometimes 8x. The behaviour approach is consistent: plan once, execute the plan.
The Accumulation Pattern#
With tool-calling, each step adds to the context:
| Call | New Content | Cumulative |
|---|---|---|
| 1 | Instructions + query | 1,200 |
| 2 | + retrieved docs | 4,500 |
| 3 | + tool output | 5,800 |
| 4 | + more output | 6,200 |
| ... | + more... | keeps growing |
With behaviour programming, each primitive call is isolated:
| Call | Content | Size |
|---|---|---|
| Planning | Signatures + examples + query | 1,800 |
| extract_answer | Context + question (only what's needed) | 1,000 |
The planning call doesn't carry forward into execution. Each primitive sees only what it needs.
Why This Matters#
The 60% token reduction isn't just about cost (though that matters; see Part 3). It's about what the model can attend to.
A model processing 22,000 tokens has attention spread thin. Instructions from the beginning compete with retrieved documents from the middle compete with recent tool outputs.
A model processing 2,800 tokens can focus. The planning call sees only what's needed to plan. The execution call sees only what's needed to answer.
Smaller context = sharper attention = better results.
Next: Part 3: Cost & Reliability, the practical implications
Series: Part 1: Attention Loss | Part 2: Token Economics ← you are here | Part 3: Cost & Reliability