Skip to content

Attention Latch: When Agents Stay Anchored to Stale Instructions

Cumulative historical context in decoder-only Transformers can over-squash mid-task updates, leaving multi-turn agents anchored to obsolete constraints despite explicit contradictory instructions.

The failure mode

An agent receives an instruction that contradicts an earlier one mid-session, then keeps acting on the earlier one. Shehata and Li (2026) name this the Attention Latch: the cumulative probabilistic weight of historical context overrides mid-task updates, anchoring the agent to obsolete constraints despite explicit contradictory instructions (Shehata & Li, 2026).

The latch is the behavioral face of Information Over-squashing in decoder-only autoregressive Transformers (Barbero et al., 2024). As history grows, distinct input sequences collapse to near-identical final-token representations, so a late instruction cannot move the representation far enough to change behavior.

Why over-squashing causes the latch

Decoder-only attention is causal: information from earlier tokens adds to the final-token representation through every later layer. The one-directional flow converging at the final token loses sensitivity to specific tokens, and low-precision floating-point formats make this worse (Barbero et al., 2024). The longer the history, the smaller the influence any single new instruction has.

This compounds with the U-shaped attention curve: a contradictory instruction inserted mid-session lands in the low-attention middle zone, where positional bias and over-squashing combine to suppress it (Liu et al., 2023).

graph TD
    H[Long multi-turn history] -->|cumulative weight| F[Final-token representation]
    U[Mid-task update] -->|low marginal influence| F
    F --> B[Behaviour anchored to old context]
    H -->|positional bias| M[Update in low-attention middle]
    M --> B

How to recognize it

This differs from an instruction-following failure on a fresh prompt. Three signals point to it:

  • The agent acknowledged the new instruction earlier in the turn but then acted on the old one — unlike objective drift, where the instruction silently falls out of context.
  • Resetting the conversation and reissuing the same instruction produces compliance.
  • Compliance returns when the contradicting prefix is removed.

If all three hold, the cause is structural over-squashing rather than ambiguous wording.

Where it triggers

Shehata and Li (2026) mapped the Attention Stability Boundary empirically across 9K trajectories on MultiWOZ 2.2. On the hardest tier — a semantic-hijacked 3-hop multi-fact synthesis task — vanilla ReAct on GPT-5.4 collapsed to 0.1% success (Shehata & Li, 2026). The agent crosses the boundary when:

  • Histories are long enough for cumulative weight to dominate.
  • Mid-task updates contradict, rather than extend, prior constraints.
  • Retrieval results inject content that semantically resembles the contradicted instruction.

Independent work confirms the wider pattern: sequences over 100K tokens show goal drift across model families, mostly through inaction (Arike et al., 2025); models deprioritize initial instructions as history grows, even when those instructions remain in context (Bui, 2026 §3.2).

Mitigations on a spectrum

Match the mitigation cost to how often the latch fires in your workload. Lightweight options first.

1. Recency anchoring (lightweight)

Push current objectives into the high-attention tail at every step. Goal recitation rewrites the objective and to-do list after each tool call. Event-driven system reminders inject the contradicting instruction as a fresh user-role message at the decision point. These do not remove over-squashing; they place the new instruction where attention is strongest.

2. History reset (medium)

Bound cumulative history before it dominates. The Ralph Wiggum Loop restarts each iteration from a fresh context, re-reading the specification from disk. Post-compaction re-read protocols restore foundational instructions after summarization. These address the latch at its root.

3. Architect and Executive separation (heavy)

Run high-level planning in one context (the Architect) and turn-by-turn execution in a separate, scoped context (the Executive) per turn — Shehata and Li's SSRP framework (Shehata & Li, 2026). Structural variants already covered on this site:

Choose this tier only after you have measured the lighter mitigations and found them not enough. The split adds an extra LLM call per turn, schema-versioning churn, and orchestration overhead, and most workloads do not cross the boundary (Microsoft Azure Architecture Center).

The grounding paradox

Heavy mitigations can overshoot. Shehata and Li (2026) report a Procedural Integrity audit at 98.8% adherence that reveals a Grounding Paradox: high-stability models fail by refusing to generate output under retrieval-reasoning contamination — the agent holds its ground so firmly it stops responding to legitimate updates (Shehata & Li, 2026). Verify that the failure has been removed, not relocated.

Where the latch does not fire

  • Short single-objective tasks. Cumulative history stays small relative to the latest turn.
  • Append-only updates. Extending prior context does not require overcoming over-squashing.
  • Aggressive harness-level resets. Frequent compaction or Ralph Wiggum-style restarts keep histories below the boundary.
  • Single-turn flows. The boundary is a multi-turn phenomenon.

Key Takeaways

  • The Attention Latch is the behavioural face of decoder-only over-squashing — a structural property, not a prompt bug.
  • It triggers when long histories collide with contradicting mid-task updates, especially in the U-shaped middle zone.
  • Mitigate on a spectrum: recency anchoring first, history reset next, architectural split only when measured drift justifies the overhead.
  • Heavy mitigations introduce the Grounding Paradox — verify the failure is removed, not relocated.
Feedback