Skip to content

Intent-Centric Engineering: Oversight Over Authorship

When code generation is cheap, engineering leverage moves from authorship to specifying intent and governing humans, agents, tools, and evidence gates.

Intent-centric engineering is the operating model where the engineer's primary work is specifying what the system should do, designing the evidence gates that prove it does, and governing the socio-technical system — humans plus agents plus tools — that produces and verifies the code. Authorship is delegated; intent and oversight are not. The framing comes from De La Cruz's thematic analysis of GenAI software engineering: GenAI's paradoxical effect is to raise the value of intent specification, context curation, architectural judgment, and verification as it lowers the cost of code production (De La Cruz, arXiv:2605.11027).

It is a destination, not a default. Read when this backfires before adopting it as a generic prescription.

When the posture pays back

Three conditions make the intent-centric posture economically defensible:

  • Repeated or fanned-out generation. A one-shot task does not recoup the cost of an intent specification plus an evidence-gate harness. The investment pays off when agents iterate or fan out — the boundary Spec Complexity Displacement identifies for spec-driven development.
  • Verification capacity exists. The team must have, or build, mechanical evidence gates — tests, schemas, linters, security scans, automated review — that catch bug classes rather than rely on review judgment. Without that scaffold, "oversight" is ceremonial.
  • Reviewers can evaluate output. Junior teams that cannot assess agent output against a spec produce a rubber-stamp checkpoint — the failure mode already named for the merge button (Empowerment Over Automation).

When any condition fails, build deterministic harnesses and verification capacity first.

The mechanism

Code generation accelerates production faster than human review scales. Faros AI data from high-adoption teams shows 98% more PRs merged but 91% longer review times — generation roughly doubled, review capacity did not (Osmani: The 80% Problem). Because the engineer cannot match generation throughput line-by-line, the advantage moves upstream to the gates that compress decision volume: an intent specification compresses many implementations into one acceptable region, a constraint-bearing harness compresses many code states into a verifiable subset, and evidence gates make verification mechanical rather than judgment-bound.

This is the mechanism Martin Fowler named "rigor relocation" — discipline does not vanish, it moves to constraint design, verification systems, and intent specification (Rigor Relocation). Intent-centric engineering names where the rigor relocates: the layer above authorship.

Why it works

The shift is more than relabeling because of enforcement locality. An intent specification fixes the acceptance region at the point an agent generates output; an evidence gate fires at the moment of decision, not after output has propagated through review. LangChain showed the effect empirically — a coding agent improved from Terminal Bench 2.0 rank 30 to rank 5 with no model change, only harness investment in pre-completion checklists, loop detection, and structured verification (LangChain).

GitHub's data ratifies the framing operationally. The merge button "still needs (and, in our view, always will need) a developer fingerprint" because three categories remain "stubbornly human": architecture trade-offs, mentorship and culture, and ethical decisions about whether to build something (GitHub: Why Developers Will Always Own the Merge Button). Intent and oversight are precisely the work the merge button represents.

What relocates

The skills that gain weight relative to authorship are the ones that compress decisions or make verification mechanical:

Skill that gains weight What it does
Intent specification Compresses many possible implementations into one acceptance region
Context curation Determines what the agent can access — the context-engineering discipline
Architectural judgment Sets boundaries agents cannot reliably reason about
Verification design Builds evidence gates that catch bug classes, not individual bugs
Security and provenance Tracks what was generated, by what, against what intent
Governance Allocates accountability across the human-plus-agent system
Accountable judgment Owns the merge decision when the evidence gates pass

These are not new disciplines — it is the redistribution of weight away from authorship toward practices that were secondary when code-writing was the bottleneck.

When this backfires

Adopting the posture as a generic prescription without the conditions above produces worse outcomes than continuing to write code.

  • Spec-as-code displacement. Specs precise enough to drive reliable generation accumulate schemas and constraints until they become code-adjacent. Scott Logic found Spec Kit produced 2,000+ lines of Markdown per feature and still introduced bugs, while iterative prompting produced working code ~10× faster (Scott Logic). Addy Osmani names this the "curse of instructions": as a spec accumulates detail, model adherence degrades (Osmani, O'Reilly). See Spec Complexity Displacement.
  • Skill atrophy compounds. Engineers who only specify and supervise lose the capability to evaluate what they supervise. The METR study found developers using AI estimated they were 20% faster while actually running 19% slower — a 39-point perception gap (METR). Atrophy is self-concealing. See Skill Atrophy.
  • Vendor ToS undercut accountability. Treude's analysis of AI development-tool Terms of Service finds "a consistent tendency to shift responsibility for correctness, safety, and legal compliance onto users" (Treude, arXiv:2605.04532). Without contractual accountability the posture becomes a unilateral burden.
  • Bottleneck migration without capacity investment. The Faros AI 98%/91% asymmetry is the warning, not the prescription (Osmani: The 80% Problem). Teams that adopt the posture without mechanical evidence gates find their accountability surface outgrows their oversight capacity.

Do not relocate rigor upward as a posture. Invest in mechanical evidence gates and harness constraints first, then treat the intent-centric model as the operating mode that investment makes possible.

Example

A platform team running an agentic refactor across a large repo writes the change as an intent specification — the invariants the refactor must preserve, the types it must not change, the test cases that must continue to pass — rather than as a sequence of code edits. The harness enforces the intent mechanically: a type checker, the existing test suite, and a custom linter that fails on banned API patterns. The team's senior engineers spend their time on the intent specification and the harness rules; the agent does the authorship; the evidence gates produce the proof that the change is acceptable. Review focuses on the architectural decisions encoded in the intent spec and the structural changes the agent proposes — not on line-by-line code style. This is the pattern GitHub's four-stage maturity model labels "Strategist" — orchestrating agents as "creative director of code" rather than implementing line by line (GitHub Octoverse: New Identity of a Developer).

The same team rejects a proposal to apply the model to a one-off prototype. The intent-spec-plus-harness overhead does not recoup for a single agent run; the spec-first investment only pays off when agents iterate or fan out.

Key Takeaways

  • Intent-centric engineering pays back only under specific conditions: repeated or fanned-out generation, existing verification capacity, and reviewers who can evaluate agent output. Outside those conditions, build the harness first.
  • The mechanism is bottleneck migration plus enforcement locality — review cannot scale with code generation, so leverage moves to upstream gates that compress decision volume at the moment of decision.
  • The skills that gain weight are intent specification, context curation, architectural judgment, verification design, security and provenance, governance, and accountable judgment. The discipline is not new; the weighting is.
  • The sharpest failure modes are spec-as-code displacement, self-concealing skill atrophy, vendor ToS that shift liability onto operators, and adopting the posture without the mechanical evidence gates that make oversight tractable.
Feedback