Back to Blog

Illustration Part 1: The Attention Loss Problem

We gave a tool-calling agent detailed instructions. It ignored them. Here's why.

OpenSymbolicAI TeamFebruary 1, 20264 min read
agentsillustrationperformanceRAGbehaviour-programming

We compared two approaches to building RAG agents:

  • Behaviour Programming: Plan once, execute in Python
  • Tool-Calling: LLM decides after each tool result
MetricBehaviourTool-CallingDifference
Total Tokens21,19952,54460% reduction
Planning Calls51771% reduction
Total Time8.16s17.93s55% faster
Avg Tokens/Query4,24010,5092.5x fewer

But the numbers only tell part of the story. The more damning finding is attention loss.

What We Observed#

The tool-calling agent was given detailed instructions with query classification, chain-of-thought guidance, and common mistakes to avoid. Despite this, it consistently failed to follow its own instructions.

Query 1: "What is machine learning?"#

This is a Type 1 Simple Factual Question. The instructions clearly state:

Strategy: retrieve(k=3) → extract_answer → final_answer

What each agent actually did:

AgentExecution PathLLM CallsTokens
Behaviourretrieve → extract_answer22,862
Tool-callingretrieve → extract_answer → retrieve → extract_answer → extract_answer → extract_answer → final_answer1122,895

The tool-calling agent made 11 calls for a simple factual question. It looped through retrieve and extract_answer multiple times, unable to recognize it already had sufficient information. This is a 5.5x overhead in calls and 8x overhead in tokens.

Query 4: "What is deep learning and how does it relate to neural networks?"#

This is a Type 2 Complex Technical Question. The instructions state:

Strategy: retrieve(k=5) → extract_answer (may need multiple) → final_answer

⚠️ IMPORTANT: For technical questions, you may need to retrieve more documents

AgentExecution PathLLM CallsTokens
Behaviourretrieve(k=8) → rerank(8 docs) → extract_answer105,625
Tool-callingretrieve → extract_answer → final_answer48,865

Here the tool-calling agent did the opposite: it under-executed, skipping the thorough processing the instructions called for. The behaviour agent used reranking to ensure high-quality results; the tool-calling agent rushed to an answer.

Why This Happens#

Token waste visualization showing how instructions get buried as context grows

The tool-calling prompt was ~1,000 tokens of instructions, including:

  • 5 critical rules
  • 5 query type classifications with strategies
  • 6-step chain-of-thought process
  • 6 common mistakes to avoid
  • Response format requirements

On the first call, the model sees all of this. By the third or fourth call, it's also processing:

  • The full instruction set (again)
  • The original question
  • All previous tool calls and their results
  • Retrieved document content (thousands of tokens)

The instructions get buried. The model loses track of which query type it classified, what strategy it planned, and whether it's done. It either loops unnecessarily or exits prematurely.

This is not a prompt engineering problem. Adding more instructions makes it worse. The fundamental issue is that tool-calling requires the LLM to re-read everything on every turn, and attention degrades as context grows.

The Core Insight#

Behaviour programming avoids this entirely. The LLM plans once, when context is small and instructions are fresh. Then Python executes the plan without further LLM involvement.

How behaviour programming preserves attention through isolated calls

The instructions are processed exactly once. They can't get buried because there's no accumulating context to bury them in.


Next: Part 2: Token Economics, where the tokens actually go

Series: Part 1: Attention Loss ← you are here | Part 2: Token Economics | Part 3: Cost & Reliability