CLI Scripts as Agent Tools: Return Only What Matters¶

Write thin wrapper scripts that pre-filter system output so agents receive a decision-ready summary rather than raw command output to parse.

Related lesson: Token-Efficient Tool Design — this concept features in a hands-on lesson with quizzes.

Raw commands waste context¶

When an agent runs kubectl get pods, it receives hundreds of lines for a production cluster. It may need only pod names and states. Context engineering (Anthropic) treats tool output as a direct context cost — the agent processes everything returned, useful or not. Anthropic reports a related pattern: executing code that filters MCP tool output before returning it to the model cut a representative workload from 150,000 to 2,000 tokens. Scripts that pre-filter at the source cut that cost.

The pattern¶

Write a wrapper script that runs the underlying command and returns only what the agent needs for the next decision:

# Raw command — returns hundreds of lines
kubectl get pods -n production

# Wrapper script — returns what matters
#!/usr/bin/env bash
kubectl get pods -n production --no-headers \
  | awk '{print $1, $2, $3}' \
  | grep -v "Running" \
  | head -20 \
  || echo "All pods running"

The agent receives "payments-worker CrashLoopBackOff 4" instead of the full pod table. One line rather than a page.

Return structured output (JSON or concise text) when possible. Structured output is easier to parse and less likely to be misinterpreted.

High-value applications¶

Scripts as agent tools work best for:

Log queries: grep and awk pipelines that extract error counts or time-windowed summaries rather than streaming raw logs.
Database lookups: queries that return specific records or aggregates, not table dumps — "ORDER-4821: shipped 2026-03-07" rather than all joined columns.
API status checks: scripts that return a single status field or formatted summary, not the full JSON response.
Cloud resource inspection: scripts that return resource names, states, and anomalies — not raw API output with timestamps and metadata.

Abstraction and access control¶

Scripts separate the agent's interface from the underlying system. If a Kubernetes cluster moves to a different orchestrator, the script changes — the tool interface does not.

Scripts also enforce read-only access as a side effect. A status script exposes no write operations, so the agent cannot change the system through it.

Design checklist¶

When you write a CLI script for an agent to use:

Filter at the source: remove irrelevant columns, rows, and metadata before returning.
Aggregate where you can: a count beats a list when the count is enough.
Format for parsing: JSON or key: value pairs are easier to process than columnar text.
Bound the output: use head or query limits so large data sets cannot produce runaway responses.
Return a clear empty state: "No errors found" is more useful than empty output that the agent may misread.

Agents writing and iterating on scripts¶

A related pattern is agents writing bash scripts — as in batch file operations — and iterating through a write-execute-debug cycle. Bash has no startup time, no compilation step, and surfaces errors in the same terminal the agent already occupies. Each iteration costs seconds.

Give the agent clear input and output contracts so the first attempt is closer to correct:

Write a bash script that:
- Input: a directory path as $1
- Output: JSON array of files larger than 10MB with name and size
- Exit 1 with a message if the directory does not exist

Keep scripts modular. One large script is harder to debug when a single part fails. Design the architecture yourself and hand components to the agent. Once it works, have the agent add comments on platform assumptions (GNU versus BSD tools), which a PostToolUse hook can detect, to help future changes.

Best for data-processing pipelines, build automation, and exploratory prototyping. Less suited to tasks that need complex data structures, type safety, or cross-platform compatibility.

When this backfires¶

Script maintenance burden. Wrappers hard-code assumptions about command output. When the underlying CLI changes its schema — a new column, a renamed field, a different exit code — the script silently breaks or returns wrong output. Every wrapper is a point you must keep in sync.

Over-filtering hides signals. A script that filters to "only errors" — without the graceful-truncation contract of a useful prefix plus a continuation handle — will miss warnings that precede errors, or status changes the agent needs. Pre-filtering bets that the author knew exactly what the agent would need. That bet becomes wrong when incident types change.

Hard to debug through the abstraction. When an agent takes a wrong action, tracing the cause through a wrapper adds a layer to the investigation. To see what the raw command (for example kubectl get pods) returned, you have to run it manually.

These conditions most often arise in novel incident types, rapidly evolving CLIs, and exploratory tasks where broad context beats a narrow summary.

Example¶

An agent investigating a failing deployment needs to check which pods are unhealthy and retrieve recent error logs. Without wrapper scripts, it issues two raw commands and parses hundreds of lines. With wrapper scripts registered as tools, the agent gets decision-ready output.

check-pods.sh — registered as the check_pods tool:

#!/usr/bin/env bash
# Returns non-running pods in a namespace as JSON
NAMESPACE="${1:-production}"
kubectl get pods -n "$NAMESPACE" --no-headers \
  | awk '$3 != "Running" {printf "{\"pod\":\"%s\",\"status\":\"%s\",\"restarts\":%s}\n", $1, $3, $4}' \
  | jq -s '.' \
  || echo '[]'

pod-errors.sh — registered as the pod_errors tool:

#!/usr/bin/env bash
# Returns last 5 error-level log lines for a given pod
POD="$1"
NAMESPACE="${2:-production}"
kubectl logs "$POD" -n "$NAMESPACE" --tail=200 \
  | grep -i "error\|fatal\|panic" \
  | tail -5 \
  || echo "No errors found"

The agent calls check_pods and receives:

[{"pod":"payments-worker-7f8b9","status":"CrashLoopBackOff","restarts":4}]

It then calls pod_errors("payments-worker-7f8b9") and receives:

2026-03-07T14:22:01Z FATAL: connection refused: payments-db:5432

Two tool calls, two concise responses. The agent identifies the root cause (database connection failure) without parsing full pod tables or scrolling through log streams.

Key Takeaways¶

Raw CLI output is a direct context expenditure — wrapper scripts that pre-filter at the source reduce that cost.
Return structured, decision-ready output (JSON or concise text); bound the output and include a clear empty state.
Scripts double as an abstraction layer and enforce read-only access as a side effect.
Bash enables a tight write-execute-debug cycle: specify input/output signatures, keep scripts modular, document edge cases after iteration.
Wrappers backfire when the underlying CLI schema changes, when over-filtering hides signals, or during exploratory tasks that need broad context.