Fetched data stays in Python, not the prompt
The agent downloads full Wikipedia articles and analyses them as Python variables. The text never re-enters the model context, whether the article is 30,000 characters or 80,000.
Before you start
Track 22 showed that a list generated by one primitive reaches the next as a
Python variable. This track applies the same idea to real-world data: full
Wikipedia articles downloaded at plan execution time. The model writes the plan
before any article is fetched. When the plan runs, each fetch() call stores
the article text in a variable. Everything after that operates on those variables
in pure Python. The article content never touches the model.
The agent: WikipediaAnalyst#
The agent exposes a fetch primitive and a set of text-analysis primitives.
# analyst.py
import json, re, urllib.parse, urllib.request
from collections import Counter
from opensymbolicai.blueprints import PlanExecute
from opensymbolicai.core import primitive
_TITLES = {
"python": "Python (programming language)",
"javascript": "JavaScript",
"artificial intelligence": "Artificial intelligence",
"computer science": "Computer science",
"alan turing": "Alan Turing",
# ... more disambiguation entries
}
def _fetch_wikipedia(topic: str) -> str:
headers = {"User-Agent": "osai-tutorial/1.0"}
title = _TITLES.get(topic.lower().strip())
if title is None:
search_url = (
"https://en.wikipedia.org/w/api.php?action=query&list=search"
f"&srsearch={urllib.parse.quote(topic)}&format=json&srlimit=1"
)
req = urllib.request.Request(search_url, headers=headers)
with urllib.request.urlopen(req, timeout=10) as resp:
results = json.loads(resp.read().decode("utf-8"))
hits = results.get("query", {}).get("search", [])
title = hits[0]["title"] if hits else topic
extract_url = (
"https://en.wikipedia.org/w/api.php?action=query&prop=extracts"
"&explaintext=1&exsectionformat=plain"
f"&titles={urllib.parse.quote(title)}&format=json"
)
req = urllib.request.Request(extract_url, headers=headers)
with urllib.request.urlopen(req, timeout=10) as resp:
data = json.loads(resp.read().decode("utf-8"))
pages = data.get("query", {}).get("pages", {})
page = next(iter(pages.values()))
return page.get("extract", "")
class WikipediaAnalyst(PlanExecute):
def __init__(self, **kwargs) -> None:
super().__init__(**kwargs)
self._cache: dict[str, str] = {}
@primitive(read_only=True)
def fetch(self, topic: str) -> str:
"""Fetch the Wikipedia article text for a topic (e.g. 'Alan Turing')."""
if topic not in self._cache:
self._cache[topic] = _fetch_wikipedia(topic)
return self._cache[topic]
@primitive(read_only=True)
def word_count(self, text: str) -> int:
"""Number of words in the text."""
return len(text.split())
@primitive(read_only=True)
def count_mentions(self, text: str, keyword: str) -> int:
"""Number of times keyword appears in text (case-insensitive)."""
return len(re.findall(re.escape(keyword.lower()), text.lower()))
@primitive(read_only=True)
def most_common_words(self, text: str, n: int = 5) -> list[str]:
"""Top n most frequent words in the text."""
words = re.findall(r"[a-z]+", text.lower())
return [word for word, _ in Counter(words).most_common(n)]
@primitive(read_only=True)
def sentiment_score(self, text: str) -> float:
"""Ask the model to score the sentiment of text from -1.0 to +1.0."""
prompt = (
"Rate the overall sentiment of the following text on a scale from "
"-1.0 (very negative) to +1.0 (very positive). "
"Reply with only a single decimal number, nothing else.\n\n"
f"{text}"
)
response = self._llm.generate(prompt)
try:
return round(float(response.text.strip()), 4)
except ValueError:
return 0.0
@primitive(read_only=True)
def sentiment_label(self, score: float) -> str:
"""Map a sentiment score to 'positive', 'neutral', or 'negative'."""
if score > 0.1:
return "positive"
if score < -0.1:
return "negative"
return "neutral"
@primitive(read_only=True)
def concat(self, a: str, b: str) -> str:
"""Concatenate two texts."""
return a + " " + b
@primitive(read_only=True)
def compare_counts(
self, label_a: str, count_a: int, label_b: str, count_b: int
) -> str:
"""Return a formatted comparison: 'Python (1,247) > JavaScript (891)'."""
if count_a > count_b:
return f"{label_a} ({count_a:,}) > {label_b} ({count_b:,})"
elif count_b > count_a:
return f"{label_b} ({count_b:,}) > {label_a} ({count_a:,})"
return f"{label_a} = {label_b} ({count_a:,})"The model sees only the signatures. It knows fetch takes a topic: str and
returns str. It does not know how large that string will be, and it does not
see the content when the plan executes.
Run four tasks#
# main.py
from analyst import WikipediaAnalyst
from opensymbolicai.llm import LLMConfig
TASKS = [
"Which has a longer Wikipedia article: Python or JavaScript?",
"Does 'algorithm' appear more in the Artificial Intelligence article or the Computer Science article?",
"What are the 5 most common words across the Python and JavaScript articles combined?",
"What is the sentiment of the Alan Turing Wikipedia article?",
]
llm = LLMConfig(provider="ollama", model="qwen2.5-coder:7b")
for task in TASKS:
agent = WikipediaAnalyst(llm=llm)
result = agent.run(task)
print(f"Task: {task}")
print(f"Result: {result.result}")
print(f"Plan:")
for line in result.plan.splitlines():
print(f" {line}")
print()uv run main.pyOutput:
Task: Which has a longer Wikipedia article: Python or JavaScript?
Result: Python (6,143) > JavaScript (5,079)
Plan:
python_text = fetch(topic="Python")
javascript_text = fetch(topic="JavaScript")
python_word_count = word_count(text=python_text)
javascript_word_count = word_count(text=javascript_text)
return compare_counts("Python", python_word_count, "JavaScript", javascript_word_count)
Task: Does 'algorithm' appear more in the Artificial Intelligence article or the Computer Science article?
Result: AI (25) > CS (12)
Plan:
ai_article = fetch(topic='Artificial Intelligence')
cs_article = fetch(topic='Computer Science')
ai_count = count_mentions(text=ai_article, keyword='algorithm')
cs_count = count_mentions(text=cs_article, keyword='algorithm')
return compare_counts('AI', ai_count, 'CS', cs_count)
Task: What are the 5 most common words across the Python and JavaScript articles combined?
Result: ['the', 'a', 'and', 'to', 'in']
Plan:
python_text = fetch(topic="Python")
javascript_text = fetch(topic="JavaScript")
combined_text = concat(a=python_text, b=javascript_text)
return most_common_words(text=combined_text, n=5)
Task: What is the sentiment of the Alan Turing Wikipedia article?
Result: positive
Plan:
article_text = fetch(topic="Alan Turing")
score = sentiment_score(text=article_text)
return sentiment_label(score=score)The plan for each task is generated before any article is fetched. Wikipedia
does not exist yet when the model writes fetch(topic="Python"). The model
knows the primitive takes a string topic and returns a string; that is enough
to plan with.
How the data moves#
OSAI -- articles as Python variables
----------------------------------------------------------
fetch("Artificial Intelligence") fetch("Computer Science")
| |
| 84,083 chars | 29,882 chars
| (Python namespace) | (Python namespace)
v v
count_mentions(ai_article, "algorithm") count_mentions(cs_article, "algorithm")
| |
+--------------+-------------------+
v
compare_counts("AI", 25, "CS", 12) --> "AI (25) > CS (12)"
----------------------------------------------------------
model sees: 5 lines of code, 0 chars of article text
Tool-calling loop -- articles through the context window
----------------------------------------------------------
fetch("Artificial Intelligence")
|
v
context: "Artificial intelligence (AI) is the simulation of...
... (84,083 chars)" <- model reads the full article
|
v
context: (AI article) + "Computer science is the study of...
... (29,882 chars)" <- +29,882 chars more
|
v
count_mentions("Artificial intelligence...", "algorithm") <- full article again
count_mentions("Computer science...", "algorithm") <- full article again
----------------------------------------------------------
model sees: 113,965 chars of text, multiple times overThe sentiment_score exception#
sentiment_score makes a direct model call from inside the primitive:
@primitive(read_only=True)
def sentiment_score(self, text: str) -> float:
prompt = "Rate the overall sentiment ... \n\n" + text
response = self._llm.generate(prompt)
...This is a model call, but it happens inside a primitive during plan execution,
not during plan generation. The planning step still never sees the article text.
The plan just contains score = sentiment_score(text=article_text). The full
article is passed to the model only when that line runs.
Primitive reference#
| Primitive | What it does |
|---|---|
fetch(topic) | Download a full Wikipedia article by topic name |
word_count(text) | Count words |
char_count(text) | Count characters |
unique_word_count(text) | Count distinct words |
count_mentions(text, keyword) | Count occurrences of a keyword |
most_common_words(text, n) | Top n most frequent words |
sentiment_score(text) | Score sentiment from -1.0 to +1.0 (model call inside primitive) |
sentiment_label(score) | Map score to positive / neutral / negative |
concat(a, b) | Join two texts |
compare_counts(label_a, count_a, label_b, count_b) | Format a comparison string |
What to notice#
- The plan is the same shape regardless of article length.
fetch("Python")returns 40,000 characters or 80,000 characters depending on the article. The plan line is identical either way. - Multiple fetches pipe independently. The comparison tasks fetch two articles. Each lands in its own variable; neither interferes with the other.
sentiment_scoreis the only model call after planning. Read the output for the Alan Turing task and compare its plan to the others. The plan is three lines: fetch, score, label. The model call for scoring happens whensentiment_scoreexecutes, not when the plan is written.