Harness Design Dimensions and Archetypes¶
A source-grounded study of 70 agent-system projects reduces harness infrastructure to five recurring design dimensions and five archetypes — a population-level lens for reading and comparing harnesses.
Why Dimensions Beat Ad-Hoc Comparison¶
Harness code — the non-LLM mediator handling tools, context, delegation, safety, and orchestration — determines agent behaviour as much as the model. Independent evidence: pure harness changes took Terminal Bench 2.0 from 52.8% to 66.5% (LangChain). Projects therefore diverge sharply in how they engineer this layer.
Hu Wei (2026) "Architectural Design Decisions in AI Agent Harnesses" analyses 70 publicly available agent-system projects through source code and technical material. The output is a shared vocabulary for reading harness choices at the ecosystem level.
Five Design Dimensions¶
graph TD
H[Agent Harness] --> SA[Subagent Architecture]
H --> CM[Context Management]
H --> TS[Tool Systems]
H --> SM[Safety Mechanisms]
H --> OR[Orchestration]
Each dimension is a position choice, not a binary (arXiv:2604.18071):
- Subagent architecture — flat, hierarchical, or peer coordination between specialised agents.
- Context management — file-persistent, hybrid, and hierarchical strategies dominate the corpus; ephemeral in-memory is rarer in production systems.
- Tool systems — registry-oriented systems are dominant; MCP- and plugin-oriented extensions are emerging.
- Safety mechanisms — intermediate isolation (sandboxes, permission prompts) is common; high-assurance audit (provenance-aware decision auditing) is rare.
- Orchestration — the control flow and scheduling layer around agent loops.
The paper complements the 12-dimension / 13-system scaffold-architecture taxonomy (arXiv:2604.03515) detailed in the next section: finer-grained analysis of individual scaffolds, lower population coverage. Pick the five-dimension view for cross-ecosystem reading; pick the 12-dimension view when characterising a single scaffold in depth.
Co-occurrence: Choices Cluster¶
Design dimensions are not independent. arXiv:2604.18071 reports three recurring clusters:
| Cluster | What pairs with what |
|---|---|
| Coordination ↔ context | Deeper subagent coordination pairs with more explicit context services |
| Execution ↔ governance | Stronger execution environments correlate with more structured governance |
| Tooling ↔ ecosystem | Formalised tool-registration boundaries align with broader ecosystem ambitions |
The implication for design: a single upgrade rarely lands in isolation. Adding multi-agent coordination without corresponding context services leaves agents starved of state; tightening tool boundaries without ecosystem commitments imposes cost without the reach that justifies it.
Five Archetypes¶
The same paper groups the 70 projects into five recurring archetypes (arXiv:2604.18071):
| Archetype | Profile |
|---|---|
| Lightweight tools | Minimal harness infrastructure; a thin loop around tool calls |
| Balanced CLI frameworks | Moderate complexity; CLI-oriented with adaptive loops and registry tools |
| Multi-agent orchestrators | Deep coordination, explicit context services, role-specialised subagents |
| Enterprise systems | Structured governance, stronger isolation, broader ecosystem scope |
| Scenario-verticalised projects | Domain-specific harnesses optimised for one class of workflow |
Archetypes are descriptive clusters, not prescriptions. A project's archetype emerges from the dimension choices that reinforce each other — which is why the co-occurrence clusters matter more than any individual dimension.
Reading a Harness with the Framework¶
Apply the five dimensions in order when evaluating or designing a harness:
- Where on the subagent spectrum — single loop, delegated roles, or peer coordination?
- Which context strategy — file-persistent, hybrid, hierarchical, or ephemeral?
- Which tool system — direct shell, typed registry, MCP, or plugin?
- Which safety posture — none, intermediate isolation, or high-assurance audit?
- Which orchestration layer — fixed pipeline, adaptive loop, or external scheduler?
Read the cluster alignments to predict where effort is missing: a project with multi-agent coordination but no file-persistent context is likely under-invested on context services; one with formal tool registration but no ecosystem scope is paying integration cost without reach.
When the Framework Under-Delivers¶
- Single-script tools — only one or two dimensions are meaningful; the archetype collapses to "lightweight tools" without informing design.
- Pre-production prototypes — co-occurrence patterns assume differentiated systems; early harnesses are not yet clustered.
- In-house vertical harnesses — the archetype is predetermined by the domain, so the framework adds vocabulary without decision support.
Example¶
Reading two public harnesses through the dimensions:
Harness A — a terminal coding agent: single control loop (flat subagent), accumulated in-memory context with summarisation on threshold, typed tool registry exposed as a shell-like interface, permission prompts before destructive actions, adaptive orchestration. Archetype: balanced CLI framework. Expected co-occurrence gap: limited multi-agent coordination means no need for explicit context services, which matches its single-context strategy.
Harness B — a multi-agent research system: hierarchical subagents with orchestrator-worker topology, file-persistent progress files and hybrid per-agent context, plugin-style tool registration with MCP extensions, sandbox isolation and audit logging, external scheduler driving orchestration. Archetype: multi-agent orchestrator / enterprise. Co-occurrence checks pass: deep coordination paired with explicit context services; formal tool registration paired with ecosystem scope.
The dimensions frame the differences; the archetypes name the clusters.
The 12-Dimension Scaffold Taxonomy (Single-Scaffold View)¶
For characterising one scaffold in depth, source-code analysis of 13 open-source coding agent scaffolds reduces the same harness layer to 12 dimensions grouped in three layers. Architecturally distinct systems produce identical surface capabilities — trajectory studies observe outputs without explaining differences — so the taxonomy makes the design choices comparable.
graph TD
S[Coding Agent Scaffold] --> CA[Control Architecture]
S --> TEI[Tool & Environment Interface]
S --> RM[Resource Management]
CA --> CA1[Loop topology]
CA --> CA2[Planning strategy]
CA --> CA3[Search / branching]
CA --> CA4[Error recovery]
TEI --> TEI1[Tool abstraction level]
TEI --> TEI2[Environment access model]
TEI --> TEI3[Feedback routing]
TEI --> TEI4[Output typing]
RM --> RM1[Context budget strategy]
RM --> RM2[State persistence]
RM --> RM3[Tool-call capping]
RM --> RM4[Cost guardrails]
Layer 1 — Control architecture decides what to do next and when to stop. Loop topology is a spectrum: fixed pipelines run a predetermined sequence; adaptive loops react to tool output; MCTS scaffolds build a search tree with backtracking — Moatless Tools implements full MCTS with numeric reward and backpropagation (arXiv:2604.03515).
| Topology | Predictability | Compute | Best for |
|---|---|---|---|
| Fixed pipeline | High | Low | Well-defined, repeatable tasks |
| Adaptive loop | Medium | Medium | Observation-reaction cycles |
| MCTS / search | Low | High | Unknown solution paths |
Planning strategy decides whether the scaffold reasons about future steps before acting (planning-first emits a plan then executes; interleaved adapts at the cost of inspectability). Error recovery ranges from aborting on first failure to retry loops, exception-specific handlers, and rollback to checkpoints.
Layer 2 — Tool and environment interface. Tool abstraction level varies from direct shell (maximum flexibility, no boundary for testing) to typed registries that reject malformed calls and enable reasoning/execution separation. Environment access model sets what the agent can observe and modify (sandboxes give a recoverable surface). Feedback routing controls where tool results go — returning all output to context is simple but expensive; routing large outputs to disk with a summary preserves budget (Anthropic: Context Engineering).
Layer 3 — Resource management handles the bounded resources of a model-in-a-loop. Context budget strategy decides what enters context and when it is pruned (see Loop Strategy Spectrum). State persistence decides what survives between iterations — in-memory state is lost on failure, file-backed state enables resumption via progress files and feature list files. Tool-call capping and cost guardrails bound unbounded loops per session, per tool, or per cost.
Scaffold architectures resist discrete classification (arXiv:2604.03515): 11 of 13 agents analysed compose multiple loop primitives rather than implementing one. Treat dimensions as continuous scales — ask "where does this scaffold sit on the control strategy spectrum?" rather than "is this a pipeline or an agent?" Reading three open-source scaffolds through the control layer: Agentless runs a 10-stage pipeline of independent scripts linked by JSONL on disk — predictable, auditable, cheap, but degrades when reproduction needs exploration; SWE-agent runs a single ReAct loop over a typed tool registry and restricted shell — more robust to unexpected paths, higher per-run cost; Moatless Tools runs full MCTS — strongest on open-ended tasks, highest compute, hardest to debug when a bad branch dominates. The 12-dimension view adds overhead without value for single-script tools (no meaningful control architecture to classify) and retrospective audits (it tells you what was built, not whether the design was right).
Key Takeaways¶
- Five dimensions — subagent architecture, context management, tool systems, safety mechanisms, orchestration — cover the non-LLM choices in an agent harness.
- Dimension choices cluster: coordination with context services, execution with governance, tooling with ecosystem. Single-axis upgrades under-perform the paired investment.
- Five archetypes (lightweight, CLI, multi-agent, enterprise, verticalised) are descriptive clusters derived from the 70-project corpus, not prescribed templates.
- The framework is most useful at ecosystem level; pair it with a finer-grained taxonomy when characterising a single scaffold.
- Rare-in-corpus signals are actionable: high-assurance audit is uncommon, so any project claiming it should be verified, not assumed.
Related¶
- Agent Harness: Initializer and Coding Agent
- Harness Engineering
- Harness Hill-Climbing: Eval-Driven Iterative Improvement of Agent Harnesses
- Runtime Scaffold Evolution: Agents That Build Tools
- Cognitive Reasoning vs Execution: A Two-Layer Agent Architecture
- Managed vs Self-Hosted Harness
- Loop Strategy Spectrum
- Multi-Agent Topology Taxonomy
- long-form