Attention Latch: When Agents Stay Anchored to Stale Instructions¶

Cumulative historical context in decoder-only Transformers can over-squash mid-task updates, leaving multi-turn agents anchored to obsolete constraints despite explicit contradictory instructions.

The failure mode¶

An agent receives an instruction that contradicts an earlier one mid-session, then keeps acting on the earlier one. Shehata and Li (2026) name this the Attention Latch: the cumulative probabilistic weight of historical context overrides mid-task updates, anchoring the agent to obsolete constraints despite explicit contradictory instructions (Shehata & Li, 2026).

The latch is the behavioral face of Information Over-squashing in decoder-only autoregressive Transformers (Barbero et al., 2024). As history grows, distinct input sequences collapse to near-identical final-token representations, so a late instruction cannot move the representation far enough to change behavior.

Why over-squashing causes the latch¶

Decoder-only attention is causal: information from earlier tokens adds to the final-token representation through every later layer. The one-directional flow converging at the final token loses sensitivity to specific tokens, and low-precision floating-point formats make this worse (Barbero et al., 2024). The longer the history, the smaller the influence any single new instruction has.

This compounds with the U-shaped attention curve: a contradictory instruction inserted mid-session lands in the low-attention middle zone, where positional bias and over-squashing combine to suppress it (Liu et al., 2023).

graph TD
    H[Long multi-turn history] -->|cumulative weight| F[Final-token representation]
    U[Mid-task update] -->|low marginal influence| F
    F --> B[Behaviour anchored to old context]
    H -->|positional bias| M[Update in low-attention middle]
    M --> B

How to recognize it¶

This differs from an instruction-following failure on a fresh prompt. Three signals point to it:

The agent acknowledged the new instruction earlier in the turn but then acted on the old one — unlike objective drift, where the instruction silently falls out of context.
Resetting the conversation and reissuing the same instruction produces compliance.
Compliance returns when the contradicting prefix is removed.

If all three hold, the cause is structural over-squashing rather than ambiguous wording.

Where it triggers¶

Shehata and Li (2026) mapped the Attention Stability Boundary empirically across 9K trajectories on MultiWOZ 2.2. On the hardest tier — a semantic-hijacked 3-hop multi-fact synthesis task — vanilla ReAct on GPT-5.4 collapsed to 0.1% success (Shehata & Li, 2026). The agent crosses the boundary when:

Histories are long enough for cumulative weight to dominate.
Mid-task updates contradict, rather than extend, prior constraints.
Retrieval results inject content that semantically resembles the contradicted instruction.

Independent work confirms the wider pattern: sequences over 100K tokens show goal drift across model families, mostly through inaction (Arike et al., 2025); models deprioritize initial instructions as history grows, even when those instructions remain in context (Bui, 2026 §3.2).

Mitigations on a spectrum¶

Match the mitigation cost to how often the latch fires in your workload. Lightweight options first.

1. Recency anchoring (lightweight)¶

Push current objectives into the high-attention tail at every step. Goal recitation rewrites the objective and to-do list after each tool call. Event-driven system reminders inject the contradicting instruction as a fresh user-role message at the decision point. These do not remove over-squashing; they place the new instruction where attention is strongest.

2. History reset (medium)¶

Bound cumulative history before it dominates. The Ralph Wiggum Loop restarts each iteration from a fresh context, re-reading the specification from disk. Post-compaction re-read protocols restore foundational instructions after summarization. These address the latch at its root.

3. Architect and Executive separation (heavy)¶

Run high-level planning in one context (the Architect) and turn-by-turn execution in a separate, scoped context (the Executive) per turn — Shehata and Li's SSRP framework (Shehata & Li, 2026). Structural variants already covered on this site:

Cognitive Reasoning vs Execution Separation — typed-tool-interface seam between layers.
Discrete Phase Separation — conversation-boundary version, with each phase in its own conversation.

Choose this tier only after you have measured the lighter mitigations and found them not enough. The split adds an extra LLM call per turn, schema-versioning churn, and orchestration overhead, and most workloads do not cross the boundary (Microsoft Azure Architecture Center).

The grounding paradox¶

Heavy mitigations can overshoot. Shehata and Li (2026) report a Procedural Integrity audit at 98.8% adherence that reveals a Grounding Paradox: high-stability models fail by refusing to generate output under retrieval-reasoning contamination — the agent holds its ground so firmly it stops responding to legitimate updates (Shehata & Li, 2026). Verify that the failure has been removed, not relocated.

Where the latch does not fire¶

Short single-objective tasks. Cumulative history stays small relative to the latest turn.
Append-only updates. Extending prior context does not require overcoming over-squashing.
Aggressive harness-level resets. Frequent compaction or Ralph Wiggum-style restarts keep histories below the boundary.
Single-turn flows. The boundary is a multi-turn phenomenon.

Key Takeaways¶

The Attention Latch is the behavioural face of decoder-only over-squashing — a structural property, not a prompt bug.
It triggers when long histories collide with contradicting mid-task updates, especially in the U-shaped middle zone.
Mitigate on a spectrum: recency anchoring first, history reset next, architectural split only when measured drift justifies the overhead.
Heavy mitigations introduce the Grounding Paradox — verify the failure is removed, not relocated.

Lost in the Middle: The U-Shaped Attention Curve — the positional-bias half of the same problem
Goal Recitation: Countering Drift in Long Sessions — recency-anchoring mitigation
Event-Driven System Reminders — harness-injected reminders at decision points
Post-Compaction Re-read Protocol — restoring foundational instructions after summarisation
Objective Drift: When Agents Lose the Thread — the post-compaction sibling failure mode
The Ralph Wiggum Loop — bounded-history restarts that keep cumulative weight low
Cognitive Reasoning vs Execution Separation — typed-interface variant of the architectural split
Discrete Phase Separation — conversation-boundary variant of the architectural split