Structured memory across sessions

An LLM has no memory by default. Every session starts fresh. This tutorial gives an agent a persistent memory file: facts the user shares are written to memory.json, and the next session reads the same file before responding.

No vector database. No embeddings. Just a file.

Why a file and not just the LLM?#

A language model's context window resets when the process ends. If a user tells the agent their name in session 1, that name is gone when the process is restarted for session 2. Writing facts to disk is the simplest way to bridge sessions: one file, readable by any future instance of the agent.

The structured approach stores each fact under a named key. When the user says "I use VS Code", the plan calls save_fact("editor", "VS Code"). When the user asks "what editor do I use?", the plan fetches exactly that key. The file stays compact because updating a fact replaces the old value.

The agent#

Six primitives handle writing, reading, and querying the memory file.

python

# memory_agent.py
class MemoryAgent(PlanExecute):

    @primitive(read_only=False)
    def save_fact(self, key: str, value: str) -> str:
        """Save a single fact to memory.json, overwriting if the key exists.

        Example: save_fact("name", "Sam") -> "Saved: name = Sam"
        """

    @primitive(read_only=True)
    def load_memory(self) -> str:
        """Read all stored facts from memory.json.

        Example: load_memory() -> "name: Sam\\neditor: VS Code"
        """

    @primitive(read_only=True)
    def load_keys(self) -> str:
        """Return all stored keys as a comma-separated string.

        Example: load_keys() -> "name, editor, language"
        """

    @primitive(read_only=True)
    def select_relevant_keys(self, question: str, available_keys: str) -> str:
        """Use an LLM to pick which keys are relevant to a question.

        Example: select_relevant_keys("What editor?", "name, editor") -> "editor"
        """

    @primitive(read_only=True)
    def get_facts_for_keys(self, keys: str) -> str:
        """Fetch values for a comma-separated list of keys.

        Returns "No results found." if none of the keys exist.
        Example: get_facts_for_keys("editor") -> "editor: VS Code"
        """

    @primitive(read_only=True)
    def respond(self, message: str) -> str:
        """Return the message as the final response."""

For broad recall ("what do you know about me?"), the plan calls load_memory and passes everything to respond. For specific questions, the plan calls load_keys, lets select_relevant_keys pick which ones matter, fetches their values with get_facts_for_keys, and responds. If nothing matches, get_facts_for_keys returns "No results found.".

Run it#

bash

uv run main.py

Five sessions on the same memory.json:

text

============================================================
Session 1
------------------------------------------------------------
User: My name is Sam and I work mainly in Go.

Plan:
  r1 = save_fact('name', 'Sam')
  r2 = save_fact('primary_language', 'Go')
  return respond("Got it! I've saved your name and primary programming language.")

Agent: Got it! I've saved your name and primary programming language.

memory.json after this session:
  name: Sam
  primary_language: Go

============================================================
Session 2
------------------------------------------------------------
User: I also prefer light mode and use VS Code.

Plan:
  r1 = save_fact('light_mode_preference', 'on')
  r2 = save_fact('editor', 'VS Code')
  return respond("Got it! I've saved your preference for light mode and editor.")

Agent: Got it! I've saved your preference for light mode and editor.

memory.json after this session:
  name: Sam
  primary_language: Go
  light_mode_preference: on
  editor: VS Code

============================================================
Session 3
------------------------------------------------------------
User: What do you know about me?

Plan:
  memory = load_memory()
  return respond(memory)

Agent: name: Sam
primary_language: Go
light_mode_preference: on
editor: VS Code

============================================================
Session 4
------------------------------------------------------------
User: What editor do I use?

Plan:
  keys = load_keys()
  relevant = select_relevant_keys('What editor do I use?', keys)
  values = get_facts_for_keys(relevant)
  return respond(values)

Agent: editor: VS Code

============================================================
Session 5
------------------------------------------------------------
User: What coffee brand do I drink?

Plan:
  keys = load_keys()
  relevant = select_relevant_keys('What coffee brand do I drink?', keys)
  values = get_facts_for_keys(relevant)
  return respond(values)

Agent: No results found.

What to notice#

Each session is a fresh instance. MemoryAgent is created anew for every session in main.py. There is no state in Python memory between sessions. Persistence comes entirely from memory.json on disk.

Two read patterns, one agent. For "what do you know about me?" the plan uses load_memory to dump everything. For "what editor do I use?" it uses load_keys + select_relevant_keys + get_facts_for_keys to fetch only what is relevant. The decomposition examples in memory_agent.py teach the LLM both patterns.

select_relevant_keys makes an internal LLM call. The primitive calls self._llm.generate() directly to match a natural language question against stored key names. This is a second LLM call inside the execution phase, after the plan is already fixed. The result feeds into get_facts_for_keys as a string, which is pure Python.

Unknown facts return a clean message. Session 5 asks about something never stored. get_facts_for_keys finds no matching key and returns "No results found." rather than hallucinating an answer.

The tradeoff: facts need a name. Key-value storage works well when information has a clear label: name, editor, timezone. It does not handle free-form notes like a debugging session or a meeting summary. For that, see Track 39.