Browse Consulting Blog Login / Sign Up
← Back to Blog
March 15, 2026 · 9 min read

The Memory Problem Nobody Talks About

You start a session with an AI agent. It is sharp, accurate, follows your instructions precisely. Two hours later, it contradicts something you told it at the beginning. It forgets a constraint. It starts repeating work it already completed. The agent has not gotten worse. Its context window has gotten full, and the system is quietly discarding information to make room. This is the memory problem, and it affects every long-running agent session in ways most users never notice until something breaks.

How Context Windows Actually Work

Every AI model processes a fixed amount of text at once. This is the context window: the total number of tokens the model can hold in a single pass. Current models range from 128K to 200K tokens, which sounds enormous until you realize how fast it fills up in a working session.

A typical coding session generates context quickly. Your system prompt uses 1,000 to 3,000 tokens. Each file the agent reads adds hundreds or thousands more. Every message you send and every response the agent generates stays in the window. Tool outputs, error messages, and intermediate results all accumulate. A productive two-hour session can easily generate 80,000 to 120,000 tokens of conversation history.

The context window is not a scrolling log. It is a fixed-size container. When it fills up, something has to go.

What Happens When Context Fills Up

Different systems handle context overflow differently, but the two most common strategies are truncation and compaction.

Truncation

The simplest approach: drop the oldest messages from the conversation. This preserves recent context but loses early instructions, initial constraints, and foundational decisions made at the start of the session. The system prompt usually survives because it is treated specially, but everything else is fair game.

Compaction

A more sophisticated approach: summarize older conversation segments into compressed representations. The full detail is lost, replaced by a summary that captures the gist. This preserves more information than truncation but introduces a critical problem: the model decides what is important enough to keep. Subtle constraints, nuanced instructions, and context-dependent decisions are exactly the kind of detail that gets lost in summarization.

Both approaches share the same fundamental flaw. The agent does not know what it has forgotten. It continues operating with full confidence, unaware that critical context has been dropped. There is no error message. There is no warning. The agent just quietly gets dumber.

Why Agents Lose Important Details

The details most likely to be lost during compaction are precisely the ones that matter most for consistent agent behavior:

Strategies to Mitigate Memory Loss

You cannot eliminate the context window constraint, but you can design your workflow to minimize its impact.

Structured Memory Files

The most effective strategy is externalizing important context into files that persist outside the conversation. Create a MEMORY.md file in your project that the agent reads at the start of each session and updates periodically during long sessions.

# MEMORY.md

## Project State
- Currently refactoring the authentication module
- Migration from JWT to session-based auth is 60% complete
- Do NOT modify /config/legacy-auth.js (needed for backward compat)

## Decisions Made
- Chose PostgreSQL over Redis for session storage (see ADR-004)
- Error responses use RFC 7807 format
- All new endpoints require rate limiting middleware

## Current Task
- Implementing session cleanup cron job
- Blocked on: need to decide TTL for inactive sessions

This file acts as external long-term memory. When the context window compacts, the agent can re-read MEMORY.md to recover critical state.

Periodic Checkpoints

Every 30 to 45 minutes during a long session, ask the agent to update the memory file with current state, recent decisions, and active constraints. This is the equivalent of saving your game. If context gets compacted, you have a recent checkpoint to restore from.

Explicit Memory Saves

After any important decision or correction, explicitly instruct the agent to write it to the memory file. Do not rely on the conversation history to preserve it. "Add to MEMORY.md: we decided to use streaming responses for the export endpoint because files can exceed 500MB."

Session Segmentation

Rather than running one marathon session, break work into focused 45-minute sessions with clear handoff artifacts. At the end of each session, the agent writes a summary of what was accomplished, what is in progress, and what to do next. The next session starts by reading that summary. This approach works with the context window rather than against it.

Working Memory vs. Long-Term Memory

It helps to think about agent memory in two categories, borrowing from cognitive science.

Working memory is the context window itself. It holds what the agent is actively processing: the current conversation, recent file contents, and immediate task context. It is fast and detailed but finite and volatile.

Long-term memory is everything stored in files: MEMORY.md, project documentation, previous session summaries, decision logs. It is persistent and unlimited but requires explicit reads to access. The agent does not automatically remember what is in long-term memory. It has to be told to look.

The practical implication is straightforward. Anything that matters beyond the current session needs to exist in a file. If it only exists in the conversation, it will eventually be lost. Build the habit of externalizing context early, and the memory problem becomes manageable rather than catastrophic.

Never lose critical agent context again.

Get Brain Backup →