Illustration Part 2: Token Economics

In Part 1, we saw that tool-calling agents lose track of their instructions as context grows. Now let's see exactly where the tokens go.

Where Tokens Go#

Behaviour Programming#

text

┌─────────────────────────────────────────────────────────────┐
│  PLANNING CALL                    →  ~1,800 input tokens    │
│  (Sees: primitives + decomposition examples + query)        │
│  (Outputs: executable plan)                                 │
└─────────────────────────────────────────────────────────────┘
                    ↓ Python executes plan
┌─────────────────────────────────────────────────────────────┐
│  EXECUTION CALL (extract_answer)  →  ~1,000 input tokens    │
│  (Sees: context + question only)                            │
│  (Outputs: answer)                                          │
└─────────────────────────────────────────────────────────────┘
                TOTAL: ~2,800 tokens

Tool-Calling#

text

┌─────────────────────────────────────────────────────────────┐
│  CALL 1: "What tool first?"       →  1,200 tokens           │
│  (Sees: full instructions + query)                          │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│  CALL 2: "Got docs, now what?"    →  4,500 tokens           │
│  (Sees: instructions + query + retrieved docs)              │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│  CALL 3: "Got answer, done?"      →  5,800 tokens           │
│  (Sees: instructions + query + docs + previous answer)      │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│  CALL 4: "Really done?"           →  6,200 tokens           │
│  (Sees: everything above + more)                            │
└─────────────────────────────────────────────────────────────┘
                ... continues looping ...
                TOTAL: ~22,000+ tokens

The tool-calling approach re-reads the instruction prompt (~1,000 tokens) on every single call. It re-reads all retrieved documents on every call after retrieval. The context accumulates linearly with each step.

Per-Query Breakdown#

Query	Behaviour Tokens	Tool-Call Tokens	Ratio
"What is machine learning?"	2,862	22,895	8.0x
"What is artificial intelligence?"	3,046	7,006	2.3x
"How do supervised and unsupervised learning differ?"	5,295	13,778	2.6x
"What is deep learning...?"	5,625	8,865	1.6x

The variance in tool-calling is itself a problem. Sometimes it uses 2.3x more tokens, sometimes 8x. The behaviour approach is consistent: plan once, execute the plan.

The Accumulation Pattern#

With tool-calling, each step adds to the context:

Call	New Content	Cumulative
1	Instructions + query	1,200
2	+ retrieved docs	4,500
3	+ tool output	5,800
4	+ more output	6,200
...	+ more...	keeps growing

With behaviour programming, each primitive call is isolated:

Call	Content	Size
Planning	Signatures + examples + query	1,800
extract_answer	Context + question (only what's needed)	1,000

The planning call doesn't carry forward into execution. Each primitive sees only what it needs.

Why This Matters#

The 60% token reduction isn't just about cost (though that matters; see Part 3). It's about what the model can attend to.

A model processing 22,000 tokens has attention spread thin. Instructions from the beginning compete with retrieved documents from the middle compete with recent tool outputs.

A model processing 2,800 tokens can focus. The planning call sees only what's needed to plan. The execution call sees only what's needed to answer.

Smaller context = sharper attention = better results.

Next: Part 3: Cost & Reliability, the practical implications

Series: Part 1: Attention Loss | Part 2: Token Economics ← you are here | Part 3: Cost & Reliability