Context Poisoning: When Hallucinations Become Premises¶
Context poisoning is when an early hallucination becomes a trusted premise, and every later step builds confidently on that false foundation.
Learn it hands-on with When the Window Lies, a guided lesson with quizzes.
The pattern¶
An agent hallucinates an incorrect detail early in a session -- a wrong API signature, a misidentified file, a nonexistent function. Nothing catches the error. Each later step treats the hallucination as ground truth, compounding the original mistake.
How it differs from related failures¶
| Failure mode | What goes wrong |
|---|---|
| Context rot (Infinite Context) | Attention degrades as context grows |
| Objective Drift | Goal lost during summarization |
| Distractor Interference | Wrong instruction attended |
| Context Poisoning | Wrong content treated as fact |
Why detection is hard¶
Output stays coherent, confident, and internally consistent. The agent does not hedge or self-correct. Early mistakes trigger a cascade: the model predicts each token from the tokens before it, so an initial error compounds into a snowball of downstream errors (Chen et al., 2025).
Common causes¶
| Cause | Mechanism |
|---|---|
| Model hallucination | Wrong API signature generated, then called in later steps |
| Stale code comments | Outdated comment treated as current behavior |
| Contaminated user input | Hidden control characters or contradictory instructions in pasted text |
| Context overflow | Poisoned content gets disproportionate attention weight (Roo Code) |
The propagation chain¶
flowchart LR
A["Step 1: Agent reads codebase"] --> B["Step 2: Hallucinates function signature"]
B --> C["Step 3: Generates code using wrong signature"]
C --> D["Step 4: Error output enters context"]
D --> E["Step 5: Agent 'fixes' by adjusting around the hallucination"]
E --> F["Step 6: Deeper into wrong solution space"]
style B fill:#c0392b,color:#fff
style C fill:#e74c3c,color:#fff
style D fill:#e74c3c,color:#fff
style E fill:#e74c3c,color:#fff
style F fill:#e74c3c,color:#fff
Each step is locally correct. In multi-agent systems the cascade crosses agent boundaries: one agent's hallucination becomes another's trusted input (Lin et al., 2025).
Example¶
A Claude Code session is tasked with refactoring a payment module. Early in the session, the agent reads the codebase and hallucinates that process_payment() accepts an optional currency parameter. It does not. The agent proceeds to:
- Refactor callers to pass
currencyexplicitly - Add currency conversion logic that calls the nonexistent parameter
- Write tests that mock the parameter
- When tests fail, "fix" by adjusting the mock setup rather than questioning the premise
Forty tool calls deep, the developer reviews a diff full of changes built on a function signature that never existed. Every change is internally consistent. The root cause, a hallucinated parameter in step 1, is buried in scroll-back.
Recovery¶
Corrective prompts patch the symptom but the poisoned content remains in context, available to re-activate on the next relevant step. The only reliable fix is a clean context: start a new session and re-anchor with verified ground truth before resuming (Roo Code).
When mitigation falls short¶
Ground-truth checks and evaluator loops reduce context poisoning but do not eliminate it:
- Silent hallucinations: a structurally plausible but wrong value passes schema validation and re-reads without flagging
- Multi-agent boundaries: sub-agents trust the orchestrator's summary, so a hallucination there propagates unchallenged
- Context compaction: summaries can re-inject the original hallucination, resetting the error clock, which is why session partitioning into clean windows beats compacting a poisoned one
Add human checkpoints at important decision points for high-stakes tasks.
Mitigation¶
| Strategy | Mechanism |
|---|---|
| Ground-truth checks | Re-read the real file each step; do not trust context memory (Anthropic) |
| Evaluator-optimizer | A second model evaluates output, breaking confirmation bias (Anthropic) |
| Pre-completion checklists | Middleware enforces verification before completion (LangChain) |
| Sub-agent isolation | Separate context windows prevent cross-task contamination (FlowHunt) |
| Externalize results | Write to files; disk is ground truth, context is lossy (FlowHunt) |
| Poka-yoke tool design | Require absolute paths, reject ambiguous identifiers (Anthropic) |
| Hard reset | New session rather than correcting within poisoned context (Roo Code) |
Key Takeaways¶
- A single early hallucination, once it enters context as a "fact," poisons every subsequent step — output stays coherent and confident while the foundation is false.
- Detection is hard precisely because the agent never hedges; corrective prompts patch symptoms but the poisoned content lingers and can re-activate.
- The reliable fix is a clean context: start a new session and re-anchor on verified ground truth rather than correcting in place.