Fetched data stays in Python, not the prompt

Track 22 showed that a list generated by one primitive reaches the next as a Python variable. This track applies the same idea to real-world data: full Wikipedia articles downloaded at plan execution time. The model writes the plan before any article is fetched. When the plan runs, each fetch() call stores the article text in a variable. Everything after that operates on those variables in pure Python. The article content never touches the model.

The agent: `WikipediaAnalyst`#

The agent exposes a fetch primitive and a set of text-analysis primitives.

python

# analyst.py
import json, re, urllib.parse, urllib.request
from collections import Counter
from opensymbolicai.blueprints import PlanExecute
from opensymbolicai.core import primitive

_TITLES = {
    "python": "Python (programming language)",
    "javascript": "JavaScript",
    "artificial intelligence": "Artificial intelligence",
    "computer science": "Computer science",
    "alan turing": "Alan Turing",
    # ... more disambiguation entries
}

def _fetch_wikipedia(topic: str) -> str:
    headers = {"User-Agent": "osai-tutorial/1.0"}
    title = _TITLES.get(topic.lower().strip())
    if title is None:
        search_url = (
            "https://en.wikipedia.org/w/api.php?action=query&list=search"
            f"&srsearch={urllib.parse.quote(topic)}&format=json&srlimit=1"
        )
        req = urllib.request.Request(search_url, headers=headers)
        with urllib.request.urlopen(req, timeout=10) as resp:
            results = json.loads(resp.read().decode("utf-8"))
        hits = results.get("query", {}).get("search", [])
        title = hits[0]["title"] if hits else topic

    extract_url = (
        "https://en.wikipedia.org/w/api.php?action=query&prop=extracts"
        "&explaintext=1&exsectionformat=plain"
        f"&titles={urllib.parse.quote(title)}&format=json"
    )
    req = urllib.request.Request(extract_url, headers=headers)
    with urllib.request.urlopen(req, timeout=10) as resp:
        data = json.loads(resp.read().decode("utf-8"))
    pages = data.get("query", {}).get("pages", {})
    page = next(iter(pages.values()))
    return page.get("extract", "")


class WikipediaAnalyst(PlanExecute):

    def __init__(self, **kwargs) -> None:
        super().__init__(**kwargs)
        self._cache: dict[str, str] = {}

    @primitive(read_only=True)
    def fetch(self, topic: str) -> str:
        """Fetch the Wikipedia article text for a topic (e.g. 'Alan Turing')."""
        if topic not in self._cache:
            self._cache[topic] = _fetch_wikipedia(topic)
        return self._cache[topic]

    @primitive(read_only=True)
    def word_count(self, text: str) -> int:
        """Number of words in the text."""
        return len(text.split())

    @primitive(read_only=True)
    def count_mentions(self, text: str, keyword: str) -> int:
        """Number of times keyword appears in text (case-insensitive)."""
        return len(re.findall(re.escape(keyword.lower()), text.lower()))

    @primitive(read_only=True)
    def most_common_words(self, text: str, n: int = 5) -> list[str]:
        """Top n most frequent words in the text."""
        words = re.findall(r"[a-z]+", text.lower())
        return [word for word, _ in Counter(words).most_common(n)]

    @primitive(read_only=True)
    def sentiment_score(self, text: str) -> float:
        """Ask the model to score the sentiment of text from -1.0 to +1.0."""
        prompt = (
            "Rate the overall sentiment of the following text on a scale from "
            "-1.0 (very negative) to +1.0 (very positive). "
            "Reply with only a single decimal number, nothing else.\n\n"
            f"{text}"
        )
        response = self._llm.generate(prompt)
        try:
            return round(float(response.text.strip()), 4)
        except ValueError:
            return 0.0

    @primitive(read_only=True)
    def sentiment_label(self, score: float) -> str:
        """Map a sentiment score to 'positive', 'neutral', or 'negative'."""
        if score > 0.1:
            return "positive"
        if score < -0.1:
            return "negative"
        return "neutral"

    @primitive(read_only=True)
    def concat(self, a: str, b: str) -> str:
        """Concatenate two texts."""
        return a + " " + b

    @primitive(read_only=True)
    def compare_counts(
        self, label_a: str, count_a: int, label_b: str, count_b: int
    ) -> str:
        """Return a formatted comparison: 'Python (1,247) > JavaScript (891)'."""
        if count_a > count_b:
            return f"{label_a} ({count_a:,}) > {label_b} ({count_b:,})"
        elif count_b > count_a:
            return f"{label_b} ({count_b:,}) > {label_a} ({count_a:,})"
        return f"{label_a} = {label_b} ({count_a:,})"

The model sees only the signatures. It knows fetch takes a topic: str and returns str. It does not know how large that string will be, and it does not see the content when the plan executes.

Run four tasks#

python

# main.py
from analyst import WikipediaAnalyst
from opensymbolicai.llm import LLMConfig

TASKS = [
    "Which has a longer Wikipedia article: Python or JavaScript?",
    "Does 'algorithm' appear more in the Artificial Intelligence article or the Computer Science article?",
    "What are the 5 most common words across the Python and JavaScript articles combined?",
    "What is the sentiment of the Alan Turing Wikipedia article?",
]

llm = LLMConfig(provider="ollama", model="qwen2.5-coder:7b")

for task in TASKS:
    agent = WikipediaAnalyst(llm=llm)
    result = agent.run(task)
    print(f"Task:   {task}")
    print(f"Result: {result.result}")
    print(f"Plan:")
    for line in result.plan.splitlines():
        print(f"  {line}")
    print()

bash

uv run main.py

Output:

text

Task:   Which has a longer Wikipedia article: Python or JavaScript?
Result: Python (6,143) > JavaScript (5,079)
Plan:
  python_text = fetch(topic="Python")
  javascript_text = fetch(topic="JavaScript")
  python_word_count = word_count(text=python_text)
  javascript_word_count = word_count(text=javascript_text)
  return compare_counts("Python", python_word_count, "JavaScript", javascript_word_count)

Task:   Does 'algorithm' appear more in the Artificial Intelligence article or the Computer Science article?
Result: AI (25) > CS (12)
Plan:
  ai_article = fetch(topic='Artificial Intelligence')
  cs_article = fetch(topic='Computer Science')
  ai_count = count_mentions(text=ai_article, keyword='algorithm')
  cs_count = count_mentions(text=cs_article, keyword='algorithm')
  return compare_counts('AI', ai_count, 'CS', cs_count)

Task:   What are the 5 most common words across the Python and JavaScript articles combined?
Result: ['the', 'a', 'and', 'to', 'in']
Plan:
  python_text = fetch(topic="Python")
  javascript_text = fetch(topic="JavaScript")
  combined_text = concat(a=python_text, b=javascript_text)
  return most_common_words(text=combined_text, n=5)

Task:   What is the sentiment of the Alan Turing Wikipedia article?
Result: positive
Plan:
  article_text = fetch(topic="Alan Turing")
  score = sentiment_score(text=article_text)
  return sentiment_label(score=score)

The plan for each task is generated before any article is fetched. Wikipedia does not exist yet when the model writes fetch(topic="Python"). The model knows the primitive takes a string topic and returns a string; that is enough to plan with.

How the data moves#

text

OSAI -- articles as Python variables
----------------------------------------------------------
  fetch("Artificial Intelligence")    fetch("Computer Science")
            |                                  |
            |  84,083 chars                    |  29,882 chars
            |  (Python namespace)              |  (Python namespace)
            v                                  v
  count_mentions(ai_article, "algorithm")    count_mentions(cs_article, "algorithm")
            |                                  |
            +--------------+-------------------+
                           v
          compare_counts("AI", 25, "CS", 12)  -->  "AI (25) > CS (12)"
----------------------------------------------------------
  model sees: 5 lines of code, 0 chars of article text


Tool-calling loop -- articles through the context window
----------------------------------------------------------
  fetch("Artificial Intelligence")
            |
            v
  context: "Artificial intelligence (AI) is the simulation of...
            ... (84,083 chars)"         <- model reads the full article
            |
            v
  context: (AI article) + "Computer science is the study of...
            ... (29,882 chars)"         <- +29,882 chars more
            |
            v
  count_mentions("Artificial intelligence...", "algorithm")  <- full article again
  count_mentions("Computer science...", "algorithm")         <- full article again
----------------------------------------------------------
  model sees: 113,965 chars of text, multiple times over

The `sentiment_score` exception#

sentiment_score makes a direct model call from inside the primitive:

python

@primitive(read_only=True)
def sentiment_score(self, text: str) -> float:
    prompt = "Rate the overall sentiment ... \n\n" + text
    response = self._llm.generate(prompt)
    ...

This is a model call, but it happens inside a primitive during plan execution, not during plan generation. The planning step still never sees the article text. The plan just contains score = sentiment_score(text=article_text). The full article is passed to the model only when that line runs.

Primitive reference#

Primitive	What it does
`fetch(topic)`	Download a full Wikipedia article by topic name
`word_count(text)`	Count words
`char_count(text)`	Count characters
`unique_word_count(text)`	Count distinct words
`count_mentions(text, keyword)`	Count occurrences of a keyword
`most_common_words(text, n)`	Top n most frequent words
`sentiment_score(text)`	Score sentiment from -1.0 to +1.0 (model call inside primitive)
`sentiment_label(score)`	Map score to positive / neutral / negative
`concat(a, b)`	Join two texts
`compare_counts(label_a, count_a, label_b, count_b)`	Format a comparison string

What to notice#

The plan is the same shape regardless of article length. fetch("Python") returns 40,000 characters or 80,000 characters depending on the article. The plan line is identical either way.
Multiple fetches pipe independently. The comparison tasks fetch two articles. Each lands in its own variable; neither interferes with the other.
sentiment_score is the only model call after planning. Read the output for the Alan Turing task and compare its plan to the others. The plan is three lines: fetch, score, label. The model call for scoring happens when sentiment_score executes, not when the plan is written.

The agent: WikipediaAnalyst#

Run four tasks#

How the data moves#

The sentiment_score exception#

Primitive reference#

What to notice#

The agent: `WikipediaAnalyst`#

The `sentiment_score` exception#