Eval-Driven Development¶

A practitioner pathway for teams adopting eval-driven development — the discipline of defining measurable success criteria before writing agent feature code.

Traditional testing assumes deterministic systems: same input, same output. Agents are non-deterministic. The same prompt, same task, same environment can produce different results across runs. This pathway teaches the eval-driven development discipline that replaces gut-feel quality assessment with reproducible, automated measurement.

The modules progress from foundational concepts through hands-on suite construction to production-grade hardening. Each builds on the previous — start at the beginning if eval-driven development is new to your team.

Core Modules¶

Module	Topic	Duration
What Evals Are and Why Agents Need Them	How evals differ from tests, the non-determinism problem, pass@k vs pass^k, why traditional QA fails for agents	30–45 min
Writing Your First Eval Suite	Task design, success criteria, grader selection, running a baseline, the 20–50 task starting point	30–45 min
Grading Strategies	Code-based grading, LLM-as-judge, human review, calibration against human judgment, when to use each	30–45 min
The Eval-First Development Loop	Eval-driven workflow, evals as executable specifications, converting existing manual checks, model upgrade testing	30–45 min
Hardening Evals for Production	Anti-reward hacking, incident-to-eval synthesis, golden query pairs, layered accuracy defense, grader validation	30–45 min

Supplementary¶

Module	Topic	Duration
Step-by-Step: Building Your First Eval-Driven Feature	Hands-on walkthrough building a PR description generator — tasks, graders, baseline, iteration, and shipping	60–90 min

Prerequisites¶

This pathway is self-contained but benefits from familiarity with:

Foundational Disciplines — especially the harness engineering module
Eval Engineering — the complementary module that this pathway expands into a full course
Eval-Driven Development — the reference page covering the core workflow pattern