Boring Technology Bias¶
Boring technology bias: LLMs recommend tools proportional to training-data frequency, not fitness for the problem — popular beats optimal by default.
The problem¶
When you ask an agent "what should I use for X?", the answer reflects training frequency. Greenfield recommendations cluster around a small set of dominant tools — GitHub Actions for CI/CD, Stripe for payments, shadcn/ui for components, Vercel for deployment — regardless of whether they fit the project. Less-popular alternatives get lower confidence scores, or the model leaves them out.
This is a frequency prior, not a reasoning failure — the training-data analogue of pattern replication risk, where exposure frequency, not quality, decides what the model picks. More training examples of popular tools mean higher confidence. Greenfield projects converge on the same narrow stack regardless of requirements.
Two distinct risks¶
The bias shows up differently by interaction mode:
flowchart LR
A[Ask agent:<br/>'What should I use?'] --> B[Recommendation bias<br/>Defaults to training frequency]
C[Tell agent:<br/>'Use this tool'] --> D[Execution capability<br/>Works fine with docs in context]
B -->|Unquestioned| E[Suboptimal stack adopted]
D -->|Context provided| F[Correct implementation]
style B fill:#c0392b,color:#fff
style D fill:#27ae60,color:#fff
Recommendation bias — what the agent suggests when you ask it to choose. It skews toward training frequency.
Execution capability — what the agent builds when you tell it what to use. It is less biased when documentation sits in context.
Agents are worse advisors than implementers.
The feedback loop¶
flowchart TB
A[Agent recommends Tool X] --> B[More developers adopt Tool X]
B --> C[More Tool X content in training data]
C --> D[Agent recommends Tool X with higher confidence]
D --> A
style A fill:#c0392b,color:#fff
Training-data representation, not product quality, decides greenfield adoption.
Concrete failure: deprecated API death spiral¶
Google deprecated its google-generativeai Python library in favor of google-genai. Models trained on the old library generate non-functional code using the deprecated GenerativeModel() pattern. Developers conclude the API is broken and switch to competitors — never producing correct-pattern content, starving training data, and deepening the bias. Documented in googleapis/python-genai#1606.
Mitigation¶
Pin technology choices in project instruction files to override training data defaults.
# CLAUDE.md (or AGENTS.md, copilot-instructions.md)
## Technology Stack
- Deployment: AWS CDK (not Vercel/Railway)
- CI/CD: GitLab CI (not GitHub Actions)
- Payments: Paddle (not Stripe)
- Components: Radix primitives (not shadcn/ui)
## Rules
- Do not suggest alternative tools unless asked
- When generating examples, use the stack above
For niche tools, give the model grounding in context. Paste official docs, a README, or a few representative examples into the conversation. That lets the model learn new library modules well enough to produce correct code. Natural-language descriptions and raw implementations can work as well as worked demonstrations (Patel et al., Evaluating In-Context Learning of Libraries for Code Generation, arxiv 2311.09635).
| Mitigation | Mechanism |
|---|---|
| Pin stack in instruction files | Overrides default recommendations |
| Paste docs, READMEs, or seed examples into context | Compensates for limited training coverage |
| Add compiler/linter validation loops | Catches deprecated API usage automatically |
| Treat tool recommendations like a junior dev's | Verify reasoning, don't accept defaults |
When this backfires¶
Overspecifying the technology stack in instruction files creates its own problems:
- Stack lock-in: pinning every tool stops agents from suggesting a better fit when requirements change mid-project.
- Onboarding friction: new contributors must learn the project's overridden defaults before the agent behaves predictably.
- False confidence: a pinned stack still needs human review — trust without verify bites here, since agents implement the pinned tool incorrectly when their training coverage for it is thin, producing confident but broken code.
- Maintenance burden: locked stacks drift as pinned libraries release breaking changes, and no one tells the agent — the instruction file becomes a source of stale guidance.
Use instruction files to override defaults for non-negotiable choices (regulatory requirements, existing infrastructure) rather than as a blanket constraint on every dependency.
Key Takeaways¶
- Agent tool recommendations track training-data frequency, not problem fit — popular beats optimal by default.
- Separate the two risks: recommendation bias (what the agent suggests) is strong; execution capability (building with a named tool in context) is far weaker.
- Pin non-negotiable technology choices in the instruction file ecosystem and paste docs or seed examples into context for niche tools.
- A pinned stack is not a substitute for human review — overspecifying trades selection bias for lock-in and stale guidance.
Related¶
- Framework-First Agent Development — reaching for frameworks too early (distinct from training-data selection bias)
- Pattern Replication Risk — agents absorb and reproduce deprecated APIs and stale patterns from existing codebases, compounding training-data bias
- Trust Without Verify — accepting agent output without verification
- Instruction File Ecosystem — the mechanism for overriding agent defaults
- CLAUDE.md Convention — pin technology choices in Claude Code's project instruction file
- Agent-Driven Greenfield Product Development