Circuit Breakers for Agent Loops¶
Circuit breakers stop agent loops when progress stalls — repeated errors, escalating costs, context exhaustion, or circular behavior signal a halt rather than continuation.
Related lesson: Cost Controls and Circuit Breakers — this concept features in a hands-on lesson with quizzes.
Also known as: Loop Detection and Stopping. To detect repetitive edits within a session, see Loop Detection.
The loop problem¶
Agents in open-ended loops consume resources without making progress. They apply the same wrong fix, or retry a flaky test 20 times. Without stopping conditions, a loop runs until the context window fills or the session is killed. That leaves degraded context and unusable partial output.
Stopping signals¶
Five signals warrant a circuit break:
- Iteration limit reached. The agent has taken N steps without completing the task. N varies by task type. Claude Code sub-agents support a
maxTurnsfield (Claude Code sub-agents docs) that enforces this at the runtime level. - Repeated failure. The same tool call fails repeatedly with the same error. A 429 three times in a row will continue to 429. A test failing for a logic error will continue to fail.
- Repetition detected. The agent is doing what it already did — fetching the same URL, reading the same file, attempting the same fix. Repetition without new information is a stuck loop.
- Context budget exceeded. The context window is approaching the "dumb zone" where output quality degrades. Chroma's Context Rot study tested 18 frontier models including GPT-4.1, Claude Opus 4, and Gemini 2.5. It found every model degrades non-uniformly as input grows, with onset depending on task similarity and distractor density. Trip the breaker when recall or coherence drops on your task, not at a fixed token count.
- Cost threshold exceeded. The task has consumed more than the expected budget. Cost overrun often correlates with loops.
Graceful degradation¶
When a circuit breaker trips, the agent should:
- Stop running new actions.
- Return the partial results finished before the break.
- Explain what triggered the stop and what remains incomplete.
- Escalate to a human if the pipeline has a human gate.
Partial results are more useful than nothing. Return what you have. Do not discard completed work.
Configuration¶
| Signal | Configuration | Enforcement |
|---|---|---|
| Iteration limit | maxTurns in agent frontmatter |
Runtime |
| Cost threshold | Session budget settings | Runtime |
| Error rate | Agent instruction + hook | Instruction / hook |
| Repetition | Agent instruction + hook | Instruction / hook |
| Context usage | Agent instruction | Instruction |
The model cannot override runtime enforcement (maxTurns, cost budgets). Instruction-level enforcement depends on the model obeying instructions, so it is less reliable for safety-critical stops. Hooks offer a middle ground: deterministic scripts that monitor and trigger a stop.
Circuit breakers are the enforcement mechanism for context health — without them, context management guidelines are advisory rather than operational limits.
Example¶
This Claude Code sub-agent definition combines a runtime-enforced maxTurns limit with an instruction-level check for repeated failure. Both signals are present, so the agent stops whether it hits the turn ceiling or runs into a recurring error.
# .claude/agents/research-agent.md frontmatter
---
name: research-agent
description: Fetches and summarises web sources for a given topic
tools:
- WebFetch
- Read
- Write
maxTurns: 20
---
# Research Agent System Prompt
You are a research agent. Fetch and summarise up to 5 sources for the given topic.
## Circuit-breaker rules
1. **Iteration limit** — enforced by `maxTurns: 20` above; you will be stopped automatically.
2. **Repeated failure** — if the same URL returns an error three times in a row, skip it,
note it as unreachable, and move to the next source. Do not retry indefinitely.
3. **Repetition detection** — if you find yourself fetching a URL you have already fetched
this session, stop and return what you have collected so far.
4. **Partial results** — when you stop for any reason before completing all 5 sources,
return the summaries you have already written plus a short note explaining what
triggered the stop and which sources were not completed.
The maxTurns: 20 field is enforced at the Claude Code runtime level and cannot be overridden by model reasoning. The instruction-level checks handle error-rate and repetition signals, which the runtime does not detect automatically.
When this backfires¶
Circuit breakers detect failure modes; they do not guarantee correctness. Set too aggressively, they become the failure mode. A reasonable practitioner would push back in at least three situations:
- Iteration limits trip on legitimate work. Setting
maxTurnslow enough to catch pathological loops also cuts off legitimate multi-step refactors or research tasks. Several production frameworks have open issues where agents halt mid-task on "stopped due to max iterations" even when making forward progress (openai-agents-python#844, langflow#10607). Raise the ceiling for task classes that legitimately need 50+ turns. - Repetition detection flags valid re-reads. Re-reading the same file after an edit, or refetching a URL after a 429 backoff, are normal behaviors, not stuck loops. Naive "did we already fetch this?" heuristics fire on both.
- Cost thresholds penalize exploration. Exploratory research agents legitimately consume variable budgets. A hard cost cap trips on successful discovery runs as readily as on loops. The signal is cost without progress, not cost alone.
- Instruction-level stops are model-dependent. Signals 2, 3, and 5 rely on the model reading its own circuit-breaker rules and obeying them. If the model ignores the instruction mid-reasoning, the stop never fires. For safety-critical stops, prefer runtime enforcement (
maxTurns, hooks) over instructions.
The steelman: if your agents already fail gracefully on their own — return partial results, detect their own thrash — then another stopping layer mostly creates false positives. Instrument first. Add breakers where instrumentation shows real loops, not as a precaution.
Key Takeaways¶
- Five stopping signals: iteration limit, repeated failure, repetition, context budget, cost threshold
maxTurnsprovides runtime-enforced iteration limits; instruction-based checks can be overridden by the model- Graceful degradation: return partial results + failure explanation, never discard completed work
Related¶
- Agent Circuit Breaker — tool-level state machine that blocks calls to degraded external tools; complementary to loop-level breakers here
- Loop Detection
- Trajectory Logging via Progress Files and Git History
- Human-in-the-Loop Placement: Where to Gate Agent Pipelines
- Idempotent Agent Operations: Safe to Retry
- Context Window Management: The Dumb Zone
- Agent Debugging: Diagnosing Bad Agent Output
- Agent Observability in Practice: OTel, Cost Tracking, and Trajectory Logging