Skip to content

Configuration Smells in AGENTS.md Files (Six-Smell Catalog)

Six named defects appear in 91 of 100 popular AGENTS.md and CLAUDE.md files — a greppable checklist for auditing the always-loaded coding-agent context.

Also known as

AGENTS.md Smells, CLAUDE.md Smells, Coding-Agent Config Smells

AGENTS.md and CLAUDE.md are loaded as always-on context at every session start — every byte trades against the task budget. dos Santos et al. (June 2026) ran the first empirical mining study of these files: across 100 popular open-source repos, 91 carried at least one of six recurring defects, and three frequently co-occur (arxiv 2606.15828).

The Six Smells

Smell Prevalence What it is How to detect
Lint Leakage 62% Restating rules already enforced by a linter, formatter, or static analyser LLM prompt analysis (93% precision)
Context Bloat 42% The file exceeds the size at which agents reliably honour its content ≥200 lines
Skill Leakage 35% Rarely-used or task-specific instructions sitting in the always-loaded file instead of a skill loaded on demand LLM prompt analysis (82% precision)
Conflicting Instructions 28% Contradictory rules creating ambiguity about expected behaviour LLM prompt analysis (57% precision — the weakest of the six; treat as indicative)
Init Fossilization 24% Content generated by /init (or equivalent) and never modified since Single-commit file history
Blind References 16% Bare path references with no pitch on when or why to read them LLM prompt analysis (87% precision)

All prevalence figures, precisions, and named examples below are from the paper's 100-repo sample (arxiv 2606.15828).

Real Examples From the Paper

  • Lint Leakagegoogle/adk-python shipped Python indentation, line-length, and naming rules better handled by ruff or black.
  • Context Bloatjavascript-obfuscator's CLAUDE.md was 1,477 lines across 27 sections, with product documentation that belonged in a separate docs/ directory.
  • Skill Leakagequickemu-project carried detailed "Adding a new OS" instructions used only for one task type.
  • Conflicting Instructionsinkline/inkline told the agent components live in packages/ui/components and in packages/components.
  • Init Fossilization — 24 projects with active ongoing development showed zero modifications to their AGENTS.md.
  • Blind ReferencesSuperClaude_Framework referenced docs/plugin-reorg.md without describing what was inside.

Why It Works

Five of the six are signal-to-token defects — Lint Leakage, Context Bloat, Skill Leakage, Init Fossilization, and Blind References reduce the useful fraction of the always-loaded context. Conflicting Instructions reduces its resolvability. Independent benchmark work converges on the same mechanism: Gloaguen et al. measured −3% task success and +20% inference cost for LLM-generated context files on SWE-bench Lite and AGENTbench, and only +4% success at +19% cost for human-written ones (arxiv 2602.11988). Naming each defect turns "our CLAUDE.md is messy" into a checklist item with a known fix — extract style rules to the linter, split rarely-used sections into on-demand skills, resolve contradictions, update fossilised content, and pitch each external reference.

When This Backfires

The catalog is calibrated against active multi-file repos. It earns its keep less in three settings:

  • Tiny utilities where the CLAUDE.md is already sub-200 lines and the codebase is stable.
  • Projects already on a strict pointer-map regime — see AGENTS.md as a Table of Contents; the discipline is already enforced.
  • Single-author short-lived prototypes where the developer holds the project in head and the agent's confusion surfaces in chat rather than burning cost invisibly.

The Conflicting Instructions detector is also the weakest at 57% precision (arxiv 2606.15828) — treat its 28% prevalence figure as indicative, not definitive.

Key Takeaways

  • 91 of 100 popular repos carry at least one of the six smells; this is the modal state of AGENTS.md and CLAUDE.md, not a fringe failure (arxiv 2606.15828).
  • Lint Leakage (62%) is the most common; Context Bloat, Skill Leakage, and Conflicting Instructions frequently co-occur, with each of the latter two raising Context Bloat likelihood by ~83%.
  • The catalog converges with independent benchmark work showing context files cost without proportional success gains (arxiv 2602.11988).
  • Use the table as a greppable audit checklist; treat the 57%-precision Conflicting Instructions detector as a flag for human review, not an auto-fix trigger.
Feedback