All tutorials
Track 43·Multi-agent & Advanced

Parallel document research

A ResearchAgent decomposes a multi-part question into per-document sub-tasks and runs a DocumentAgent for each in parallel threads. Results are synthesized into a single answer.

intermediate10 min
Video coming soon
Browse this tutorial's folder in tutorials-pygithub.com/OpenSymbolicAI/tutorials-py/tree/main/43-parallel-research

A single question often spans multiple documents. Answering "which of Newton, Einstein, and Curie was born earliest?" means reading three articles. Reading them one at a time is slow. This tutorial runs one agent per article in parallel threads and combines the results.

Why not just ask the LLM?#

A language model does not read documents at inference time. It retrieves from training data, which may be stale or incomplete. Worse, it cannot tell you which paragraph it drew from. Grounding each answer in a specific file makes the result verifiable.

Doing it in one plan with one agent also means reading every article sequentially, waiting for each LLM call before starting the next. A ThreadPoolExecutor runs all the article lookups concurrently. For a three-person question, three DocumentAgent plans execute in parallel; the total time is roughly the slowest individual lookup, not the sum.

Two agents#

DocumentAgent handles one file and one question. It has two primitives:

python
# document_agent.py
class DocumentAgent(PlanExecute):

    @primitive(read_only=True)
    def read_file(self, path: str) -> str:
        """Load a text file. Returns the first 4000 characters."""
        with open(path, encoding="utf-8") as f:
            return f.read()

    @primitive(read_only=True)
    def extract_answer(self, text: str, question: str) -> str:
        """Answer question using only the supplied text. One or two sentences."""
        prompt = (
            "Answer the following question using ONLY the text provided.\n"
            "Be concise, one or two sentences.\n\n"
            f"Text:\n{text}\n\nQuestion: {question}"
        )
        return self._llm.generate(prompt).text.strip()

Given the goal "Read articles/newton.txt and answer: What year was Newton born?", it produces a two-line plan:

python
text = read_file("articles/newton.txt")
return extract_answer(text, "What year was Newton born?")

ResearchAgent coordinates multiple DocumentAgent instances with three primitives:

python
# research_agent.py
class ResearchAgent(PlanExecute):

    @primitive(read_only=True)
    def decompose(self, question: str) -> list:
        """Split a multi-part question into sub-tasks.

        Each sub-task covers exactly one person and one article.
        Returns a list of dicts: [{"question": "...", "article": "..."}, ...]
        """

    @primitive(read_only=True)
    def research_parallel(self, sub_tasks: list) -> list:
        """Run one DocumentAgent per sub-task in a thread pool.

        Returns a list of answer strings, one per sub-task.
        """
        def run_one(task: dict) -> str:
            goal = f"Read {task['article']} and answer: {task['question']}"
            return DocumentAgent(llm=self._llm).run(goal).result or ""

        with ThreadPoolExecutor(max_workers=len(sub_tasks)) as pool:
            return list(pool.map(run_one, sub_tasks))

    @primitive(read_only=True)
    def synthesize(self, question: str, findings: list) -> str:
        """Combine individual findings into a single coherent answer."""

Every research question generates the same three-line plan:

python
sub_tasks = decompose(question)
findings  = research_parallel(sub_tasks)
return synthesize(question, findings)

Run it#

bash
uv add opensymbolicai-core
ollama pull qwen2.5-coder:7b
uv run main.py

Wikipedia articles are downloaded automatically on first run.

Sample output#

text
Model: qwen2.5-coder:7b
============================================================

Q: Which of Newton, Einstein, and Curie was born earliest, and in which country?
   [newton.txt]   In what year was Newton born?
      -> 1643
   [einstein.txt] In what year was Einstein born?
      -> 1879
   [curie.txt]    In what year was Curie born?
      -> 1867
   Answer: Newton was born earliest, in 1643, in England.
   (10.1s)

Q: Who among Einstein, Curie, Darwin, and Turing won a Nobel Prize, and for what?
   [einstein.txt] Did Einstein win a Nobel Prize?
      -> Yes, Albert Einstein won the 1921 Nobel Prize in Physics.
   [curie.txt]    Did Curie win a Nobel Prize?
      -> Yes, Marie Curie won two Nobel Prizes: Physics 1903, Chemistry 1911.
   [darwin.txt]   Did Darwin win a Nobel Prize?
      -> No, Darwin did not win a Nobel Prize.
   [turing.txt]   Did Turing win a Nobel Prize?
      -> No, Turing did not win a Nobel Prize.
   Answer: Einstein won the 1921 Nobel Prize in Physics for the photoelectric
           effect. Curie won two: Physics in 1903 for radioactivity research
           and Chemistry in 1911 for discovering radium and polonium.
           Darwin and Turing did not win a Nobel Prize.
   (26.0s)

Q: What were the main fields of work for Darwin and Turing? Did their lives overlap?
   [darwin.txt]   What were the main fields of work for Darwin?
      -> Natural history, geology, and biology.
   [turing.txt]   What were the main fields of work for Turing?
      -> Theoretical computer science, algorithms, and cryptanalysis.
   Answer: Darwin worked in natural history, geology, and biology. Turing
           worked in theoretical computer science and cryptanalysis. Darwin
           died in 1882; Turing was born in 1912. Their lives did not overlap.
   (17.9s)

What to notice#

Every research question produces the same three-line plan. decompose, research_parallel, synthesize is the fixed shape. The LLM does not need to decide how many threads to use or which articles to open; that is decided inside research_parallel based on what decompose returned. The plan is short and predictable because the primitives carry the complexity.

Thread safety comes from fresh agent instances. Each call to run_one creates a new DocumentAgent. PlanExecute agents hold no shared mutable state between runs, so running many of them in a thread pool is safe. The shared llm object has its own internal lock on the cache.

decompose uses an internal LLM call. The primitive calls self._llm.generate() directly to parse the question and produce a JSON list of sub-tasks. The plan only sees the list that comes back. This pattern appeared in Track 38 and Track 39: use an internal LLM call when the transformation is linguistic, not logical.

The outer agent's plan is always three lines. No matter how many people the question mentions, ResearchAgent writes the same plan and delegates the per-document work to inner agents. Adding a sixth article to the corpus requires no changes to either agent class.