Skip to content

Lethal Trifecta Threat Model

The lethal trifecta is private data, untrusted input, and external egress on one path — remove at least one leg from every execution path.

Learn it hands-on with The Lethal Trifecta guided lesson, which includes quizzes.

The three legs

The lethal trifecta (Willison, 2025) names three capabilities that together create an exploitable surface:

graph TD
    PD["1. Private Data Access"]
    UI["2. Untrusted Input"]
    EC["3. External Communication"]

    PD --- RISK["Exploitable<br/>Attack Surface"]
    UI --- RISK
    EC --- RISK

    style RISK fill:#b60205,color:#fff,stroke:#b60205
Leg What it means Examples
Private data Secrets, credentials, PII, or proprietary code .env files, DB connections, internal repos
Untrusted input Content the agent did not author and cannot fully trust PR comments, GitHub issues, fetched pages, dependencies
External communication Ability to send data outside the sandbox HTTP tools, MCP servers with outbound calls

LLMs cannot reliably separate trusted instructions from injected ones. Once untrusted input enters context, it influences tool calls. The trifecta moves defense from prompt-level mitigation to architecture.

Remove a leg

No execution path should hold all three legs. Which leg to remove depends on the task.

Remove egress (most common for coding agents)

Default-deny outbound network — most coding tasks need none.

# Docker-based sandbox — no network
docker run --network none agent-image

Vendors ship this as a deterministic control: OpenAI's Lockdown Mode caps outbound requests with no AI evaluation in the loop — no reliance on the model to police itself (Willison, 2026).

Remove private data access

Strip sensitive data before it reaches context. You have three options:

  • PII tokenization — replace real values with opaque tokens that a trusted executor resolves
  • Scoped credentials — inject short-lived, minimal-permission tokens at runtime
  • File exclusion — keep .env, credentials, and key files out of agent-accessible paths

Remove untrusted input

Restrict the agent to operator-controlled content — viable for internal automation, not external or user-generated content.

Design patterns for trifecta mitigation

Six patterns (Beurer-Kellner et al., 2025) map to leg removal:

Pattern Leg removed Mechanism
Dual LLM Untrusted input Privileged LLM decides; quarantined LLM handles untrusted content
Action-Selector Untrusted input LLM picks from a fixed action set; injected instructions can't add new actions
Plan-Then-Execute Untrusted input Plan formed before untrusted content is seen; execution is deterministic
Context-Minimization Untrusted input Only minimum necessary untrusted content enters context
Code-Then-Execute Untrusted input LLM generates code; sandboxed runtime executes without LLM re-evaluation
LLM Map-Reduce Private data Each instance sees only a partition; no single instance has full data access

CaMeL (Debenedetti et al., 2025) enforces separation via control- and data-flow primitives — 77% task completion with provable security.

Attack chains

Poisoned dependency (Lynch / NVIDIA, 2025): an agent reads a GitHub issue that names a malicious pip package and installs it (egress). The package then exfiltrates env vars (private data). Fix: remove egress.

Cross-agent privilege escalation (Embrace The Red, 2025): one agent rewrites another's config to drop sandbox constraints, granting all three legs. Fix: protect config from writes.

MCP tool exfiltration (Invariant Labs, 2025): a malicious MCP server shadows trusted tools, reads private context, and forwards it externally. Fix: restrict MCP server egress.

Trifecta audit checklist

Execution path Private data? Untrusted input? Egress? Safe?
Code review agent Yes Yes (PR content) No Yes
Research agent No Yes (web) Yes Yes
Deployment agent with env vars Yes Yes (repo config) Yes No
Internal codegen Yes No Yes Yes

Three "Yes" values require architectural mitigation.

Mandatory sandbox controls

Set four controls (Harang, 2025):

  • Network egress — default-deny with explicit allowlists
  • File system — block writes outside the workspace
  • Config protection — prevent changes to .cursorrules, CLAUDE.md, and MCP configs
  • Secret injection — short-lived, minimal-permission tokens

When this backfires

The trifecta model is a structural heuristic, not a guarantee:

  1. Leg removal is not always feasible. A research agent that fetches live web content, holds API keys, and posts to external endpoints has all three legs by design. For unavoidable trifectas, add compensating controls such as output scanning, rate-limiting, and egress anomaly detection.

  2. Partial-leg states are underspecified. "Read-only egress" and "tokenized private data" sit between leg-present and leg-absent. Binary Yes/No columns produce false confidence when a leg is partially present.

  3. Leg removal migrates risk. Tokenizing PII shifts the attack to the token resolver, and sandboxing egress shifts it to sandbox-escape. Each removal creates a new high-value target that you must harden in turn.

Key Takeaways

  • Risk requires all three legs at once: private data, untrusted input, and external egress. Removing any one closes the exfiltration path.
  • Remove egress first for coding agents — most tasks need no network, and a default-deny sandbox is a deterministic control the model cannot override.
  • Audit per execution path, not per agent. A single path with three "Yes" values demands architectural mitigation, not prompt-level defenses.
  • Leg removal migrates risk rather than erasing it: each removed leg creates a new high-value target (token resolver, sandbox boundary) that must itself be hardened.
Feedback