Lethal Trifecta Threat Model¶
The lethal trifecta is private data, untrusted input, and external egress on one path — remove at least one leg from every execution path.
Learn it hands-on with The Lethal Trifecta guided lesson, which includes quizzes.
The three legs¶
The lethal trifecta (Willison, 2025) names three capabilities that together create an exploitable surface:
graph TD
PD["1. Private Data Access"]
UI["2. Untrusted Input"]
EC["3. External Communication"]
PD --- RISK["Exploitable<br/>Attack Surface"]
UI --- RISK
EC --- RISK
style RISK fill:#b60205,color:#fff,stroke:#b60205
| Leg | What it means | Examples |
|---|---|---|
| Private data | Secrets, credentials, PII, or proprietary code | .env files, DB connections, internal repos |
| Untrusted input | Content the agent did not author and cannot fully trust | PR comments, GitHub issues, fetched pages, dependencies |
| External communication | Ability to send data outside the sandbox | HTTP tools, MCP servers with outbound calls |
LLMs cannot reliably separate trusted instructions from injected ones. Once untrusted input enters context, it influences tool calls. The trifecta moves defense from prompt-level mitigation to architecture.
Remove a leg¶
No execution path should hold all three legs. Which leg to remove depends on the task.
Remove egress (most common for coding agents)¶
Default-deny outbound network — most coding tasks need none.
# Docker-based sandbox — no network
docker run --network none agent-image
Vendors ship this as a deterministic control: OpenAI's Lockdown Mode caps outbound requests with no AI evaluation in the loop — no reliance on the model to police itself (Willison, 2026).
Remove private data access¶
Strip sensitive data before it reaches context. You have three options:
- PII tokenization — replace real values with opaque tokens that a trusted executor resolves
- Scoped credentials — inject short-lived, minimal-permission tokens at runtime
- File exclusion — keep
.env, credentials, and key files out of agent-accessible paths
Remove untrusted input¶
Restrict the agent to operator-controlled content — viable for internal automation, not external or user-generated content.
Design patterns for trifecta mitigation¶
Six patterns (Beurer-Kellner et al., 2025) map to leg removal:
| Pattern | Leg removed | Mechanism |
|---|---|---|
| Dual LLM | Untrusted input | Privileged LLM decides; quarantined LLM handles untrusted content |
| Action-Selector | Untrusted input | LLM picks from a fixed action set; injected instructions can't add new actions |
| Plan-Then-Execute | Untrusted input | Plan formed before untrusted content is seen; execution is deterministic |
| Context-Minimization | Untrusted input | Only minimum necessary untrusted content enters context |
| Code-Then-Execute | Untrusted input | LLM generates code; sandboxed runtime executes without LLM re-evaluation |
| LLM Map-Reduce | Private data | Each instance sees only a partition; no single instance has full data access |
CaMeL (Debenedetti et al., 2025) enforces separation via control- and data-flow primitives — 77% task completion with provable security.
Attack chains¶
Poisoned dependency (Lynch / NVIDIA, 2025): an agent reads a GitHub issue that names a malicious pip package and installs it (egress). The package then exfiltrates env vars (private data). Fix: remove egress.
Cross-agent privilege escalation (Embrace The Red, 2025): one agent rewrites another's config to drop sandbox constraints, granting all three legs. Fix: protect config from writes.
MCP tool exfiltration (Invariant Labs, 2025): a malicious MCP server shadows trusted tools, reads private context, and forwards it externally. Fix: restrict MCP server egress.
Trifecta audit checklist¶
| Execution path | Private data? | Untrusted input? | Egress? | Safe? |
|---|---|---|---|---|
| Code review agent | Yes | Yes (PR content) | No | Yes |
| Research agent | No | Yes (web) | Yes | Yes |
| Deployment agent with env vars | Yes | Yes (repo config) | Yes | No |
| Internal codegen | Yes | No | Yes | Yes |
Three "Yes" values require architectural mitigation.
Mandatory sandbox controls¶
Set four controls (Harang, 2025):
- Network egress — default-deny with explicit allowlists
- File system — block writes outside the workspace
- Config protection — prevent changes to
.cursorrules,CLAUDE.md, and MCP configs - Secret injection — short-lived, minimal-permission tokens
When this backfires¶
The trifecta model is a structural heuristic, not a guarantee:
-
Leg removal is not always feasible. A research agent that fetches live web content, holds API keys, and posts to external endpoints has all three legs by design. For unavoidable trifectas, add compensating controls such as output scanning, rate-limiting, and egress anomaly detection.
-
Partial-leg states are underspecified. "Read-only egress" and "tokenized private data" sit between leg-present and leg-absent. Binary Yes/No columns produce false confidence when a leg is partially present.
-
Leg removal migrates risk. Tokenizing PII shifts the attack to the token resolver, and sandboxing egress shifts it to sandbox-escape. Each removal creates a new high-value target that you must harden in turn.
Key Takeaways¶
- Risk requires all three legs at once: private data, untrusted input, and external egress. Removing any one closes the exfiltration path.
- Remove egress first for coding agents — most tasks need no network, and a default-deny sandbox is a deterministic control the model cannot override.
- Audit per execution path, not per agent. A single path with three "Yes" values demands architectural mitigation, not prompt-level defenses.
- Leg removal migrates risk rather than erasing it: each removed leg creates a new high-value target (token resolver, sandbox boundary) that must itself be hardened.