Action-Audit Divergence: A Four-Mode Taxonomy for Runtime Hardening¶

A runtime action-audit divergence takes four forms — gate-bypass, audit-forgery, silent host failure, wrong-target — each a coverage question for existing controls.

What the runtime must guarantee¶

An agentic runtime issues tool calls and actuates devices for an LLM. Its load-bearing safety property is that the audit record matches what actually happened. Metere (arXiv:2605.01740) formalizes this as four divergence modes:

Mode	Name	What the audit lies about
F1	Gate-bypass	Authorisation said no; the action ran
F2	Audit-forgery	Action ran; log shows a different action
F3	Silent host failure	Log says action ran; host did nothing
F4	Wrong-target	Log names target X; action hit target Y

The taxonomy is a navigation aid, not a defense — it converts "is this runtime hardened?" into four closed questions mapped to existing controls.

Formally the property is a multiset equality: the intended (capability, target) pairs must equal those executed after every action (Metere, arXiv:2605.01740). A biconditional checker logs denials, not just allows, so it fails closed on any diff. It detects divergence that the per-mode controls below fail to prevent.

Mapping each mode to existing controls¶

graph TD
    F1[F1 Gate-Bypass] --> C1[Action-Selector + Admission Gate]
    F2[F2 Audit-Forgery] --> C2[Hash-Chained Tamper-Evident Log]
    F3[F3 Silent Host Failure] --> C3[Bootstrap Seal + Module Signing]
    F4[F4 Wrong-Target] --> C4[Egress Policy + URL/Target Validation]

    style F1 fill:#fce8e6,stroke:#d93025
    style F2 fill:#fef3e0,stroke:#e8a100
    style F3 fill:#e8f4fd,stroke:#1a73e8
    style F4 fill:#e6f4ea,stroke:#1e8e3e

F1, gate-bypass. Authorization rejected the request, but the action ran anyway. The control is a single chokepoint every tool call must pass. The action-selector pattern restricts the LLM to a fixed catalog, so unsanctioned actions cannot be expressed. The MCP runtime control plane intercepts every MCP call at one policy point. Logging denials, not just allows, closes the asymmetry attackers exploit when only allow-paths are visible.

F2, audit-forgery. The action ran and was logged, but the log was changed to claim a different action ran. Tamper-evident hash chains defeat this by construction. Each entry includes the hash of the previous one, so any change breaks the chain on verification (AuditableLLM, MDPI 2026). The site's Cryptographic Governance Audit Trail covers the implementation with ML-DSA-65 receipt signing.

F3, silent host failure. The log records "action X executed", but the host did nothing — the process crashed, an error was swallowed, or the container was killed mid-call. The signal must come from outside the runtime. Bootstrap seals verify a known-good start state, module signing verifies that executing code matches audited code, and post-execution probes confirm the side effect landed. Without these, F3 looks identical to drift.

F4, wrong-target. The log says "emailed alice@", but the message went to attacker@. The control is target validation at the egress boundary, not at argument generation. The agent network egress policy restricts reachable domains. The URL exfiltration guard validates targets independently of LLM intent.

Using the taxonomy as a review checklist¶

Walk F1-F4 against any runtime or harness:

F1, name the chokepoint. Where does every tool call pass authorization? "The LLM checks" is not a chokepoint — the LLM is what is being authorized.
F2, name the integrity mechanism. Append-only is not enough. The log must be tamper-evident even with an attacker on the host. Hash chains, Merkle trees, or external receipt sinks (nono.sh on tamper-evident agent audit) close the gap.
F3, name the liveness probe. What confirms the action actually ran? Side-effect verification, downstream acks, or out-of-band telemetry beat "the call returned 200".
F4, name the target validator. What checks that the file path, hostname, recipient, or endpoint is the intended one, independent of LLM-generated arguments? HashiCorp's write-up frames this as unifying infrastructure telemetry with identity logs.

A control may cover multiple modes (a hash-chained log with policy receipts covers F1 and F2), and a mode may need several controls. The taxonomy does not prescribe — it names the question each control answers.

Where the framing backfires¶

The decomposition assumes there is an audit worth defending. In three conditions it adds cost without value:

Single-user local runtimes with no compliance obligation. F1-F4 each motivate non-trivial architecture, so capability minimization and rollback-first design deliver more safety per unit of complexity.
Pure-text agents. Without tool calls, there is no action to diverge from an audit.
Reversible-state systems. When every action is rolled back once badness is detected, post-hoc tamper-evidence is less load-bearing than detection latency.

It complements the four-layer threat taxonomy. That model groups threats by attack surface; this one groups runtime safety properties by failure mode. One places controls on a grid, the other audits whether the grid is load-bearing.

Key Takeaways¶

An agent runtime's load-bearing safety property is that the audit record matches what actually happened.
Four divergence modes — F1 gate-bypass, F2 audit-forgery, F3 silent host failure, F4 wrong-target — name the specific ways the audit can lie.
Each mode maps to existing site coverage: action-selector and MCP control plane for F1, hash-chained audit trail for F2, bootstrap and module signing for F3, egress and URL validation for F4.
Use the taxonomy as a review checklist, not a defense — name the chokepoint, integrity mechanism, liveness probe, and target validator for any runtime under review.
The framing assumes an audit worth defending; for single-user local runtimes, pure-text agents, and reversible-state systems, capability minimisation often beats divergence detection.

Four-Layer Taxonomy of Agent Security Risks — companion threat-surface layering; pair with this divergence-mode model
Cryptographic Governance Audit Trail — F2 control: hash-chained tamper-evident logs with ML-DSA receipts
Action-Selector Pattern — F1 control: deterministic execution from a fixed action catalog
MCP Runtime Control Plane — F1 control: single chokepoint for tool-call policy evaluation
Agent Network Egress Policy — F4 control: target validation at the network boundary
Tool Signing and Signature Verification — F3 control: module-level integrity for executing code