Law of Triviality in AI PRs¶
Reviewers bikeshed small changes and rubber-stamp large ones. AI agents produce large diffs by default, so the code that needs the most scrutiny gets the least.
The pattern¶
Parkinson's Law of Triviality (1957) says attention scales inversely with complexity. Reviewers scrutinize small diffs and rubber-stamp large ones.
Agents routinely produce PRs past the threshold where review stays effective. Hand-written tweaks attract debate while AI diffs pass unexamined. This differs from PR Scope Creep: the cause is reviewer psychology, not scope.
Defect detection collapses with size¶
The SmartBear/Cisco study (2,500 reviews) puts optimal review at 100-300 LOC in 30-60 minutes; effectiveness drops past 400. Propel quantifies the drop:
| PR Size (lines) | Defect Detection Rate | Review Time | Comments per PR |
|---|---|---|---|
| 1-200 | 87% | ~45 min | 3.2 |
| 101-300 | ~70% | ~60 min | ~4.1 |
| 301-600 | 65% | ~2 hr | 2.4 |
| 1,000+ | 28% | ~4.2 hr | 1.8 |
Four hours on 1,000 LOC yields fewer comments than 45 minutes on 200. Fatigue causes disengagement, not depth.
AI makes it worse¶
CodeRabbit finds AI PRs contain 1.7x more issues than human code: 3x more readability issues and 75% more logic defects. Three mechanisms compound the problem:
- Template blindness: AI output follows familiar patterns, so reviewers skim and bugs hide in boilerplate. (AsyncSquad Labs)
- AI brain fry: sustained AI oversight produces mental fog and higher error rates. (HBR / Help Net Security)
- Nyquist under-sampling: code production tripled while review sampling stayed flat, so defects alias as passing. (Bryan Finster)
graph LR
A[Agent generates<br/>large diff] --> B[Reviewer overwhelmed]
B --> C[Rubber-stamp approval]
C --> D[Defects ship]
D --> E[Trust in review<br/>erodes]
E --> F[Even less scrutiny<br/>on next PR]
F --> B
Mitigation stack¶
1. Constrain batch size¶
Target 100-300 LOC per PR. Split agent work into atomic commits and enforce size gates in CI.
2. Tiered review¶
Use tiered code review:
| Tier | Reviewer | Scope |
|---|---|---|
| 1 | Automated (lint, SAST, tests) | Syntax, style, known vulnerability patterns |
| 2 | AI-augmented review | Flag risk hotspots, check for common AI mistakes |
| 3 | Human expert | Architecture, business logic, domain context |
See Agentic Code Review Architecture.
3. Semantic diffing¶
Review behavior changes, not raw lines. AST diffs and API-contract analysis surface what moved.
4. BDD-first specification¶
Define expected behavior before the agent codes. Review then becomes validation against pre-agreed criteria. See Spec-Driven Development.
When this backfires¶
Size limits fail for genuinely atomic changes (cross-cutting refactors, schema migrations), when monorepo coordination exceeds review benefit, or when LOC gates force superficial splits — many small PRs, collectively incoherent.
Example¶
An agent completes a feature sprint and opens a single 1,400-LOC PR touching auth, billing, and the data model. The reviewer spends 3 hours skimming and approves with two style comments. A logic error in the billing calculation ships.
The same work split into three PRs (auth at 180 LOC, billing at 220 LOC, data model at 160 LOC) would have received an average of 4 or more comments each at an 87% defect detection rate. The billing bug would have been caught.
CI enforcement keeps scope in check:
# .github/workflows/pr-size.yml
- name: Check PR size
run: |
LINES=$(git diff --stat origin/main...HEAD | tail -1 | grep -oP '\d+ insertion' | grep -oP '\d+')
if [ "${LINES:-0}" -gt 400 ]; then
echo "PR exceeds 400 LOC. Split into smaller atomic PRs."
exit 1
fi
Related¶
- The Bottleneck Migration — systemic shift from generation to review as the binding constraint
- PR Scope Creep as a Human Review Bottleneck
- Comprehension Debt
- LLM Code Review Overcorrection
- Shadow Tech Debt
- Agentic Code Review Architecture
- Diff-Based Review Over Output Review
- Cognitive Load and AI Fatigue
- Signal Over Volume in AI Review