Tiered Memory Architecture¶
A two-tier memory store whose pipeline promotes episodic facts into a semantic tier on re-use — improving long-window retrieval only for long, recurring sessions.
The architecture¶
A flat-file memory store keeps growing. Every episode lands in the same JSONL or vector index, and signal dilutes as the corpus expands. Tiered architectures separate raw episodes from generalized facts and run a promotion pipeline between them.
MEMTIER (Sidik and Rokach, 2026) defines five components:
- Episodic JSONL store — observations, tool calls, and outcomes appended as structured records
- Five-signal weighted retrieval — relevance, recency, outcome, frequency, and structural compatibility scored per query
- Attention-attributed cognitive weight loop — entry weights updated from how the model actually attended to retrieved entries
- Asynchronous consolidation daemon — promotes episodic entries into a semantic tier when re-use crosses a threshold
- PPO-based retrieval policy — adapts the per-tier weights from feedback rather than hand-tuned constants
graph LR
A[Agent turn] --> B[Episodic JSONL<br>append-only]
B -.->|consolidation<br>daemon| C[Semantic tier<br>generalised facts]
A --> D[Weighted retrieval]
B --> D
C --> D
D --> E[PPO policy<br>tier weights]
E -.->|update| D
style C fill:#2d5a2d,stroke:#4a4a4a,color:#e0e0e0
style E fill:#2d4a5a,stroke:#4a4a4a,color:#e0e0e0
The semantic tier is not pre-populated. It accumulates only entries the daemon observes retrieved across multiple unrelated triggers — the test for whether a fact has generalized beyond its original episode.
What promotion buys you¶
Flat-store retrieval re-ranks every entry on every call, so a high-value generalized fact competes against thousands of low-value raw observations. Promoting it into a smaller, separately-scored tier evaluates that fact against fewer competitors and lets the tiers carry different weights (MEMTIER §3). Broader memory surveys treat consolidation between memory forms as a generalization step, not a storage optimization (Memory in the Age of AI Agents, arxiv:2512.13564).
Reported numbers are conditional. MEMTIER claims 38.2% accuracy on LongMemEval-S with Qwen2.5-7B — +33 percentage points over a full-context baseline at 5% on a 6GB consumer GPU (Sidik and Rokach, 2026). The baseline is weak; a tuned RAG-over-JSONL system operates well above 5%, so the margin over a non-tiered RAG store is smaller than the headline suggests. The paper is preprint-only and unreplicated. Treat the architecture as defensible, not the absolute numbers as load-bearing.
When tiering pays off¶
The overhead — a consolidation daemon, attention-attribution loop, and PPO policy network — is spread across long windows and recurring tasks. It is not free.
Tiering pays off when:
- Operation windows exceed a day or two. The 14-percentage-point degradation over 72 hours that motivates the design (MEMTIER abstract) is the regime where consolidation fires often enough to matter.
- Task structure is recurring. The PPO policy learns from outcome feedback. Without recurring task signatures it never converges, and tier-aware retrieval underperforms a static recency-weighted baseline.
- Retrieval is dilution-bound, not relevance-bound. Below a few thousand entries the embedding model dominates, and tier separation contributes little.
- Cross-tenant isolation is required. A separate semantic tier with controlled promotion is the natural place for provenance and pruning policies once stored episodes become an attack surface (Memory Poisoning and Secure Multi-Agent Systems, arxiv:2603.20357).
When a flat store is the right answer¶
Tiering adds two new failure surfaces: incorrect promotion (an episodic fact generalized into a wrong semantic rule) and policy drift (the PPO retrieval policy learning to over- or under-fetch from the wrong tier). Both compound with agent lifetime, the regime where memory is supposed to help (arxiv:2512.13564).
Skip tiering when:
- Sessions are short (sub-day) — promotion never fires often enough to cover the daemon.
- Latency dominates accuracy — per-turn cost from consolidation, attention attribution, and a PPO policy inflates inner-loop time.
- Single-developer and single-tenant — tier isolation costs are not justified.
- A simpler design already meets the bar. A flat JSONL store with an embedding index (a flat RAG store), a recency multiplier, and periodic LLM-summarized compaction captures most of the value. Site patterns — episodic memory retrieval, memory synthesis from execution logs, Memory Retrieval as a Control Decision — cover the same ground at lower operational complexity.
Risks specific to tier promotion¶
- Wrong-direction generalization — frequency does not distinguish "valid across contexts" from "the same incident kept recurring in one context." Stack- or environment-specific entries get promoted then misapplied.
- Stale semantic facts persist longer — promoted entries weighted higher decay slower. Facts invalidated by a refactor outlive their originating episodes — the staleness mode in agent memory patterns, amplified by tier weighting.
- Policy drift on heterogeneous workloads — a PPO policy trained on one distribution silently retrieves from the wrong tier when the workload shifts.
Mitigate by gating promotion on a confidence signal, reviewer pass, or semantic-tier expiry — not on frequency alone.
Key Takeaways¶
- A two-tier store with consolidation is one design point on the agent-memory spectrum, not a default — it pays off for long operation windows and recurring task structure
- Promotion should be conditional on observed cross-context re-use, not raw frequency, to avoid generalising single-context facts
- Reported accuracy gains are against a weak full-context baseline; the margin over a well-tuned flat RAG store is smaller and unreplicated
- Tiering adds incorrect-promotion and policy-drift failure modes that scale with agent lifetime — audit the promotion step explicitly
Related¶
- Episodic Memory Retrieval — episode-keyed recall without explicit tier promotion
- Agent Memory Patterns: Learning Across Conversations — scope-based memory architecture for cross-session learning
- Memory Retrieval as a Control Decision — controller deciding whether to inject retrieved memory
- Memory Synthesis from Execution Logs — extracting causal lessons from execution traces into persistent knowledge
- Subtask-Level Memory for Software Engineering Agents — granularity choice in memory retrieval
- Memory Retrieval as a Control Decision — utility-score updates for stored memories from outcome feedback
- Generative Agents Memory Stream — three-layer architecture for long-running agents with high observation density
- Component-Isolated Memory Stress Testing — stress-tests the summarisation, storage, and retrieval stages of this pipeline so a regression attributes to one tier