Skip to content

Sandbox-Enforced PII Tokenization in Agent Workflows

Sandbox-enforced PII tokenization replaces sensitive fields with deterministic tokens before data reaches the model, so real values never enter the context window.

Related lesson: The Payload That Waits covers this concept in a hands-on lesson with quizzes.

PII tokenization replaces sensitive field values — emails, names, account numbers — with deterministic placeholder tokens before they reach the model's context window. The sandbox enforces the boundary: real values never reach the model, and de-tokenization happens only inside the sandbox when downstream tools need the original data.

Why model context is a data risk

Any data an agent reasons about enters its context window. Inference infrastructure can log, cache, or observe what sits there. In regulated domains such as healthcare, finance, and legal, putting patient identifiers, account numbers, or contact details into model context creates data residency and compliance exposure.

Anthropic's MCP code execution research describes the sandbox-as-privacy-boundary pattern: sensitive values move between tools inside the sandbox while the model sees only deterministic placeholders.

How tokenization works

Before data surfaces to the model, the execution environment replaces sensitive field values with deterministic tokens:

Original Tokenized
alice@example.com {{EMAIL_1}}
Jane Smith {{NAME_1}}
4111-1111-1111-1111 {{CC_1}}

The sandbox maintains a token-to-value mapping. When a downstream tool needs the real value — to send an email, make a payment, or write to a database — de-tokenization happens inside the sandbox before the call.

graph TD
    A[Raw data fetched] --> B[Sandbox tokenizes PII fields]
    B --> C[Tokenized data enters model context]
    C --> D[Agent reasons and makes tool calls with tokens]
    D --> E[Sandbox de-tokenizes for downstream tools]
    E --> F[Real values used inside sandbox only]

The model only ever sees tokens. Real values stay inside the sandbox.

What the agent can still do

Tokenization does not block useful work. With tokenized data, the agent can:

  • Count records: "847 records have {{EMAIL_N}} fields"
  • Filter by structure: "Records where {{CC_N}} is present but {{EMAIL_N}} is missing"
  • Detect patterns: "All {{NAME_N}} values follow a given format"
  • Route records to queues

The agent reasons about structure, counts, and relationships — not the values themselves. This is enough for most analytical and routing tasks.

Deterministic rules, not model judgment

Deterministic rules in the execution environment enforce the boundary, not model judgment. The model does not decide what is sensitive; the sandbox does.

Model judgment is probabilistic. An instruction like "do not include email addresses in your reasoning" is a prompt — the model may follow, ignore, or misinterpret it. A sandbox that intercepts and replaces all fields matching ^[\w.-]+@[\w.-]+$ before data reaches the model is a deterministic control that nothing can reason around.

Implementation considerations

  • Token determinism: the same real value must produce the same token within a session so the agent can correlate references across tool calls.
  • Token namespace by type: type-prefixed tokens ({{EMAIL_N}}, {{NAME_N}}) let the agent reason about field kind without seeing the value.
  • De-tokenization audit log: log every de-tokenization — which token, when, and for which downstream call.
  • Scope and expiry: keep tokens session-scoped. Short-lived maps reduce compliance exposure and support GDPR right-to-erasure — delete the map and de-tokenization becomes impossible by design.

Example

A healthcare data-processing agent needs to triage patient records. Before any data enters the model context, the execution environment scans each record and replaces sensitive fields with typed tokens. The model receives {{NAME_1}}, {{EMAIL_1}}, and {{DOB_1}} instead of real values and can still count, filter, and route records based on field presence and structure.

When the agent issues send_summary(patient="{{NAME_1}}"), the sandbox intercepts the call, resolves the token against the session map, passes the real name to the downstream API, and logs the de-tokenization event with timestamp and call context.

When this backfires

Tokenization is a boundary control, not a complete privacy solution. It fails or falls short in these conditions:

  • Detection gaps: regex-based PII detection misses contextual quasi-identifiers — job titles, internal employee IDs, composite fields. Google Cloud's de-identification reference architecture recommends post-tokenization re-identification risk analysis because pattern-matching alone leaves these gaps.
  • Safety gate interference: type-prefixed token labels like SSN: {{IDENTIFIER_1}} can trigger model safety refusals. The label next to the token signals sensitive data even without the value, so you have to strip or neutralize the field label, which adds complexity.
  • Overlong agent sessions: when session-scoped token maps span many hours or tool calls, the map itself becomes a high-value target. Long-lived maps need the same access controls as the underlying PII vault — treat the map as managed secrets.
  • Rich semantic tasks: agents asked to draft a personalized email or generate a narrative report need the actual values. Tokenization forces a de-tokenize-then-inject step that partially re-exposes data in tool inputs, which narrows how well the boundary works.
  • Observability blind spots: traces, error reports, and request logs around the inference path often capture raw prompts and tool inputs that bypass the redaction layer. Practitioner reports attribute 25–40% of discovered PII exposure to observability surfaces even when the inference path itself was well-redacted. The audit log and any tracing pipeline that touches the sandbox must inherit the same access controls as the PII vault; see also PII redaction guidance for MCP servers on extending redaction to every returned artifact.

Key Takeaways

  • Sensitive values should never appear in the model's context window; the sandbox is the privacy boundary.
  • Enforce tokenization with deterministic rules, not model judgment — instructions are insufficient controls.
  • Agents can still reason about structure, counts, and relationships using tokenized representations — the same filter-and-aggregate-in-the-environment capability.
  • De-tokenization happens inside the sandbox when downstream tools require real values.
  • Log every de-tokenization event for audit traceability.
Feedback