Pre-Trust Execution Surface in Coding Agent Harnesses¶

Project-local config a coding agent loads at session start executes before the trust prompt — defer execution until after the user accepts trust.

The failure mode¶

Most coding-agent harnesses load project-local config eagerly at startup: settings files, hook definitions, MCP server manifests, environment variables, and localhost listeners. The trust dialog appears after the harness has parsed, and often executed, this config.

Anthropic's 2026-05-25 post documents this: "Claude Code reads project settings during startup — before presenting the standard 'Do you trust this folder?' prompt" (How we contain Claude across products). Three vulnerabilities disclosed between mid-2025 and January 2026 shared this shape. A developer clones a repo to review a PR. The repo's .claude/settings.json defines a hook. The attacker-committed hook then executes automatically during init (Anthropic Engineering, 2026).

The trust dialog is not the security boundary. Everything that runs before it appears is what matters.

What composes the pre-trust surface¶

Across tools, the directories loaded implicitly follow the same shape (Google Cloud security research, 2026):

File class	Why it executes pre-trust
Settings files (`.claude/settings.json`, `.cursor/rules/`, `.codex/`, `.github/copilot/`)	Parsed to determine which permissions, hooks, and tools the session offers
Hook definitions	Some events fire during session-start itself, so the harness reads them pre-trust
MCP server manifests (`.mcp.json`, project-scoped configs)	Stdio MCP servers spawn at startup; HTTP manifests may auto-fetch endpoints
Environment variable overrides	`ANTHROPIC_BASE_URL` and similar values are read at process init, before any dialog renders
Localhost listeners	The harness opens sockets at startup so the editor extension can connect

Each is an attacker-controlled byte stream once the repo is cloned from an untrusted source.

Why this class of bug exists¶

The eager-load assumption is structural. To show a prompt that lists configured behaviors rather than just "trust this folder?", the harness must read which hooks are wired, which MCP servers to start, and which permissions are allowed. So it reads config first and renders trust state second. That becomes a vulnerability because the cloned repo arrived over the public internet, often a PR review where the developer is expected to read code from unknown contributors. Treating that config as implicitly trusted is the same error as parsing an inbound HTTP body before you authenticate the request.

The remediation¶

Anthropic prescribes sequencing: establish the trust boundary first, then parse and execute project-local config (Anthropic Engineering, 2026):

"defer parsing and execution of project-local configuration until after the user accepts the trust prompt"

"treat project-open, config-load, and localhost listeners the way you'd treat any inbound request from the internet"

A practical split for harness authors:

Pre-trust phase: read config as data only — structure, paths, declared hooks, declared MCP servers — for the prompt to display. Never execute it.
Trust boundary: show the prompt with the parsed structure visible, so the user accepts or rejects knowing what would activate.
Post-trust phase: spawn MCP servers, register hooks, evaluate environment overrides, and open localhost listeners.

The fix generalizes. Any harness that loads project-local config (Codex .codex/, Cursor .cursor/rules/, Copilot .github/copilot/, future tools) has the same surface and needs the same sequencing. The Cuckoo Attack research showed the class is reproducible across nine agent and AI-IDE combinations (Cuckoo Attack, 2025).

VS Code 1.126 ships a concrete instance of this sequencing: new folders open in Restricted Mode with the trust prompt deferred to a banner, and the over-trust-prone "Trust Parent" button was removed (VS Code 1.126 release notes).

Relationship to the lethal trifecta¶

Pre-trust execution adds a time-domain dimension to the Lethal Trifecta Threat Model. That model's three capabilities — private data, untrusted content, egress — together create an exploitable principal. Pre-trust execution lets all three converge before the principal has consented to act: egress can land before the user has seen the trust prompt. A real-world illustration of that egress blast radius comes from outside this project's Claude/Copilot/Cursor focus: the xAI grok CLI was reported to upload its entire working directory — including SSH keys, a password-manager database, and personal documents — to cloud storage buckets when run, a concrete instance of over-broad file access combining with egress to exfiltrate secrets (Simon Willison, 2026).

When this backfires¶

The pattern matters most for unfamiliar repositories. Three failure conditions carry an uneven cost:

Resident first-party repos: developers who reopen a long-lived repo many times a day pay post-trust init latency every session. Trust is effectively durable. A stale-trust cache needs invalidation on config changes, or long-lived trust undoes the deferred-execution discipline (Mindgard research, 2026).
Headless CI runs: when an agent runs in CI on every commit, no human is at a prompt to defer to. The fix is not deferred execution but sandbox isolation or pre-merge config review.
Devcontainer-isolated workflows: inside a container with a network firewall, the container bounds the pre-trust blast radius. The pattern still matters for credential exfiltration. Anthropic's docs note that --dangerously-skip-permissions inside the container cannot prevent exfiltration of in-container credentials (Claude Code devcontainer docs).

Key Takeaways¶

The trust dialog is not the security boundary — every byte parsed before the dialog renders is attacker-controlled when the repository came from outside.
The pre-trust surface spans settings files, hook definitions, MCP manifests, environment variables, and localhost listeners — structurally the same across .claude/, .cursor/, .codex/, and .github/copilot/.
The remediation is sequencing: parse config as data pre-trust, execute only post-trust.
Pre-trust execution adds a time-domain dimension to the lethal trifecta — all three legs can converge before the principal has consented to act.
Devcontainer isolation reduces blast radius but does not substitute for the sequencing fix; headless CI runs need sandbox isolation or pre-merge config review.