Context Compression Strategies: Offloading and Summarisation¶

Tiered compression — offloading large payloads and summarising history — keeps long-running agents within the context window without losing task continuity.

Learn it hands-on: Offload vs Summarise — guided lesson with quizzes.

The problem¶

Long-horizon tasks accumulate context from conversation turns, tool inputs, and tool outputs. Without compression, the agent truncates arbitrarily or the session fails. Compression keeps the task intent and critical state while discarding low-value content.

Tiered compression¶

LangChain's Deep Agents framework implements three compression tiers, applied in order as context pressure increases (Context Management for Deep Agents):

graph TD
    A[Context fills] --> B{Large tool response?}
    B -->|Yes| C[Offload to filesystem]
    C --> D[Replace with reference + summary]
    B -->|No| E{85% threshold reached?}
    D --> E
    E -->|Yes| F[Summarise conversation history]
    F --> G[Restart with compressed context]
    E -->|No| H[Continue normally]

Tier 1: offload large tool responses¶

Replace large tool payloads (full files, API responses, search results) with a filesystem reference and brief summary. Full content goes to disk, and the agent re-reads it when needed. This keeps content recoverable without holding payloads in active context. You can configure the thresholds — frameworks typically set them in the tens of thousands of tokens.

Tier 2: summarize conversation history¶

When context fills further, summarize prior turns. Keep the current objective, key artifacts, decisions and rationale, and next steps. Discard exploratory turns, superseded instructions, resolved errors, and intermediate reasoning that did not affect outcomes. The agent restarts with the summary as prior context. Anthropic's context engineering guide calls this "compaction" and names it a core strategy for long-horizon tasks.

Cache preservation during compaction¶

Compaction reuses the parent session's cached prefix, so a cache_control breakpoint at the end of the system prompt keeps that cache valid across the cycle. Only the new summary lands as a fresh entry, which keeps post-compaction turns cheap (Anthropic's compaction guide).

Progressive five-stage compaction¶

OPENDEV extends the two-tier approach with Adaptive Context Compaction (ACC), a five-stage pipeline triggered at specific context budget thresholds (Bui, 2026 §2.3.6):

Stage	Trigger	Action
1 — Warning	70% budget	Log context pressure for monitoring; no data reduction
2 — Observation Masking	80% budget	Replace older tool results with compact reference pointers
2.5 — Fast Pruning	85% budget	Prune older tool outputs beyond recency window
3 — Aggressive Masking	90% budget	Shrink preservation window to only most recent outputs
4 — Full Compaction	99% budget	Serialize history to scratch file; LLM-summarize middle portion

Recent tool outputs stay at full fidelity. An Artifact Index serialized into compaction summaries tracks every file touched, and the summary carries the history archive path — making compaction effectively non-lossy (Bui, 2026 §2.3.6).

Graduated stages let the agent degrade step by step rather than hitting a single compression cliff where the full history collapses at once.

What to preserve in summaries¶

Summaries that only capture "what happened" without "what matters next" cause objective drift. An effective summary structure:

Section	Content
Objective	The original task and any scope changes
State	What has been built, changed, or decided
Constraints	Any constraints surfaced during the session
Next steps	The immediate next action

Why it works¶

Transformer attention runs over all tokens in the window. As context grows, relevant signal competes with accumulated noise — redundant tool outputs, superseded reasoning, resolved errors — and retrieval precision degrades. Compression reduces this noise floor. Offloading removes content that is addressable on demand but rarely needed. Summarization distills decision rationale and state into a compact form the model can condition on. The mechanism is selective discarding, not lossy encoding — artifacts remain on disk, so compaction is non-destructive for recoverable content.

The effect is measurable. One empirical study reports that pruning context to the last five tool call/response pairs, plus summarization, reached 91.6% task completion versus 71% for full-context agents, at a fraction of the tokens and runtime. This supports combining the offload and summarize tiers rather than carrying full history (Pruning and summarising context for tool-using agents).

When this backfires¶

Compression degrades task continuity when applied incorrectly:

Silent context loss: aggressive summarization drops subtle constraints whose importance only emerges later. Anthropic's context engineering guide recommends starting with maximum recall and iterating toward precision, not the reverse.
Premature compaction: a too-low threshold forces lossy summarization when context is still navigable, causing objective drift if it omits scope constraints.
Broken recoverability: offloaded payloads deleted or moved after compaction cannot be re-read, which makes the approach worse than in-context storage. The observation store must persist for the full session lifetime.
Compounding errors across cycles: each cycle introduces summarization error. Long sessions accumulate drift a single summary cannot undo.

Testing compression¶

Threshold stress-testing: lower the threshold, then verify task continuity across cycles
Recoverability: after offloading, verify the agent retrieves content on demand
Objective drift check: after summarization, verify the next action matches the original task

Key Takeaways¶

Tiered compression applies in sequence: offload large tool responses first, then summarise history.
Five-stage compaction provides graduated degradation instead of a single compression cliff.
Summaries must preserve task objective, current state, and next steps — not just action history.
Offloading preserves recoverability; summarisation is lossy — retain decision rationale, not just outcomes.
Compaction reuses the cached system-prompt prefix, so a cache_control breakpoint keeps post-compaction turns cheap.

Example¶

Pseudocode showing how tiered compression maps to agent configuration:

# Pseudocode — illustrates the tiered compression pattern,
# not a specific framework's API.

agent = Agent(
    tools=[...],
    # Tier 1: offload tool responses above 20k tokens to disk
    max_observation_length=20_000,
    observation_store="./agent_observations/",
    # Tier 2: summarise at 85% context budget
    compaction_threshold=0.85,
    compaction_summary_prompt=(
        "Summarise: (1) current objective, (2) key artifacts created, "
        "(3) decisions made and rationale, (4) immediate next step."
    ),
)

The summarizer prompt structure maps to the preservation table above: objective, state, constraints, next steps.

Manual Compaction as Dumb Zone Mitigation
Post-Compaction Re-read Protocol — restoring instruction-file fidelity after compaction summaries paraphrase rules
Context Window Dumb Zone
Prompt Compression: Maximizing Signal Per Token
Context Budget Allocation: Every Token Has a Cost
Lost in the Middle: The U-Shaped Attention Curve
Goal Recitation: Countering Drift in Long Sessions
The Infinite Context