Back to Blog

Illustration Part 2: Token Economics

Tool-calling re-reads everything on every call. Here's exactly where the tokens go.

OpenSymbolicAI TeamFebruary 1, 20263 min read
agentsillustrationperformanceRAGbehaviour-programming

In Part 1, we saw that tool-calling agents lose track of their instructions as context grows. Now let's see exactly where the tokens go.

Where Tokens Go#

Behaviour Programming#

text
┌─────────────────────────────────────────────────────────────┐
│  PLANNING CALL                    →  ~1,800 input tokens    │
│  (Sees: primitives + decomposition examples + query)        │
│  (Outputs: executable plan)                                 │
└─────────────────────────────────────────────────────────────┘
                    ↓ Python executes plan
┌─────────────────────────────────────────────────────────────┐
│  EXECUTION CALL (extract_answer)  →  ~1,000 input tokens    │
│  (Sees: context + question only)                            │
│  (Outputs: answer)                                          │
└─────────────────────────────────────────────────────────────┘
                TOTAL: ~2,800 tokens

Tool-Calling#

text
┌─────────────────────────────────────────────────────────────┐
│  CALL 1: "What tool first?"       →  1,200 tokens           │
│  (Sees: full instructions + query)                          │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│  CALL 2: "Got docs, now what?"    →  4,500 tokens           │
│  (Sees: instructions + query + retrieved docs)              │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│  CALL 3: "Got answer, done?"      →  5,800 tokens           │
│  (Sees: instructions + query + docs + previous answer)      │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│  CALL 4: "Really done?"           →  6,200 tokens           │
│  (Sees: everything above + more)                            │
└─────────────────────────────────────────────────────────────┘
                ... continues looping ...
                TOTAL: ~22,000+ tokens

The tool-calling approach re-reads the instruction prompt (~1,000 tokens) on every single call. It re-reads all retrieved documents on every call after retrieval. The context accumulates linearly with each step.

Per-Query Breakdown#

QueryBehaviour TokensTool-Call TokensRatio
"What is machine learning?"2,86222,8958.0x
"What is artificial intelligence?"3,0467,0062.3x
"How do supervised and unsupervised learning differ?"5,29513,7782.6x
"What is deep learning...?"5,6258,8651.6x

The variance in tool-calling is itself a problem. Sometimes it uses 2.3x more tokens, sometimes 8x. The behaviour approach is consistent: plan once, execute the plan.

The Accumulation Pattern#

With tool-calling, each step adds to the context:

CallNew ContentCumulative
1Instructions + query1,200
2+ retrieved docs4,500
3+ tool output5,800
4+ more output6,200
...+ more...keeps growing

With behaviour programming, each primitive call is isolated:

CallContentSize
PlanningSignatures + examples + query1,800
extract_answerContext + question (only what's needed)1,000

The planning call doesn't carry forward into execution. Each primitive sees only what it needs.

Why This Matters#

The 60% token reduction isn't just about cost (though that matters; see Part 3). It's about what the model can attend to.

A model processing 22,000 tokens has attention spread thin. Instructions from the beginning compete with retrieved documents from the middle compete with recent tool outputs.

A model processing 2,800 tokens can focus. The planning call sees only what's needed to plan. The execution call sees only what's needed to answer.

Smaller context = sharper attention = better results.


Next: Part 3: Cost & Reliability, the practical implications

Series: Part 1: Attention Loss | Part 2: Token Economics ← you are here | Part 3: Cost & Reliability