Skip to content

Continuous Triage: Automating Issue Classification with AI Workflows

AI agents replace manual issue triage by classifying, labeling, and routing issues on every event or schedule, running continuously with read-only defaults and constrained writes.

Three triage operations

Continuous triage splits into three operations. You can automate each one on its own:

Operation Input Output Frequency
Summarize Issue body, comments Structured summary for triagers On issue creation
Label Issue content, repo context Category labels (bug, feature, docs) On issue creation/update
Route Labels, team assignments Assignment to team or individual After labeling

These operations form a pipeline. Summarize provides context, labeling classifies, and routing dispatches. Each can run on its own or chain in sequence.

Implementation with GitHub Agentic Workflows

GitHub Agentic Workflows are the main way to build continuous triage. You define each workflow as a Markdown file with YAML frontmatter that specifies triggers, permissions, and safe outputs. GitHub compiles it to a .lock.yml file that GitHub Actions runs (GitHub Blog).

A triage workflow runs read-only by default. To let it write, you declare safe outputs — pre-approved actions like add-label, create-comment, or add-assignee. Each safe output has a volume limit, and its content is sanitized before it runs (GitHub Blog).

# Frontmatter for a triage workflow
on:
  issues:
    types: [opened]
permissions:
  contents: read
  issues: write
  models: read
safe-outputs:
  - add-label:
      allowed-labels: [bug, feature, docs, duplicate, needs-info]
  - add-comment:
      max-count: 1

This permission model lets agents run continuously. The agent can classify thousands of issues without risk of unbounded writes.

Pre-built triage actions

GitHub ships two Actions for AI triage. Both use the workflow GITHUB_TOKEN with models: read permission, so you need no external API keys (GitHub Changelog):

AI Assessment Comment Labeler (github/ai-assessment-comment-labeler) runs multiple prompt files in parallel against issue content. It applies structured labels (ai:<prompt-stem>:<assessment>), supports comment suppression for silent classification, and outputs JSON for later workflow steps.

AI Moderator (github/ai-moderator) detects spam, link spam, and AI-generated content on issues and comments. It auto-labels flagged content and can minimize it. It also supports custom prompt overrides for team-specific moderation rules.

Classify-then-route pattern

The routing pattern from Anthropic's agent design maps directly to triage. A classifier agent determines the issue category, then routes to a specialized follow-up process. This avoids the performance loss that comes from tuning a single prompt for mixed inputs (Anthropic: Building Effective Agents).

graph TD
    A[Issue Created] --> B[Classifier Agent]
    B -->|Bug| C[Bug Triage]
    B -->|Feature| D[Feature Backlog]
    B -->|Docs| E[Docs Queue]
    B -->|Duplicate| F[Link & Close]
    B -->|Needs Info| G[Request Details]

Each downstream handler can use a different prompt, model, or safe-output set, tuned for its category rather than handling every category in one pass.

Tool design for classification

Label definitions in triage tools should be explicit and mutually exclusive. Each should include concrete examples that show when the label applies. Anthropic's advanced tool use guidance recommends one to five examples per tool to reduce classification ambiguity (Anthropic: Advanced Tool Use).

Effective label definitions include:

  • Name and description — what the label means in this project's context
  • Inclusion criteria — concrete examples of issues that receive this label
  • Exclusion criteria — what this label does not cover, to prevent overlap
  • Priority signal — whether this label implies urgency

Context loading for high-volume repos

Just-in-time context loading applies directly to triage. Load issue metadata lightly (title, labels, first paragraph), then fetch full details only when classification needs deeper analysis. This avoids context exhaustion on repos that process hundreds of issues per day (Anthropic: Context Engineering).

For high-volume repos, schedule batch triage rather than triggering on every event. This spreads the cost. Each run processes accumulated issues in one agent session rather than spawning a separate session per issue.

Why it works

The classify-then-route pattern beats a single general-purpose triage prompt. LLMs lose accuracy when one prompt must handle unrelated intent types at once, because each category pulls the output toward a different structure. Routing to a specialized prompt per category tunes each prompt to a narrower label space, which improves both precision and reliability. Research on LLM-based intent detection confirms that a narrower label space reduces out-of-scope errors and raises confidence on in-scope classifications (Intent Detection in the Age of LLMs, arXiv 2410.01627).

When this backfires

Continuous AI triage adds cost and complexity that manual triage avoids on low-volume repos. Three conditions make it worse than the manual alternative:

  • Small repos with predictable issue types — when a repo gets fewer than 20 issues per month and labels rarely change, a human triages in seconds, with better judgment on edge cases than a general-purpose model.
  • High-volume bursts with large context — models handling hundreds of issues at once can fail to follow instructions or skip tasks when context windows fill with accumulated issue content. GitHub's own documentation of Agentic Workflows notes that large context and complex tasks cause tasks to be skipped or instructions to be ignored (GitHub Blog).
  • Non-determinism in mission-critical routing — the same workflow can produce different label assignments on different runs. Where incorrect routing causes SLA breaches or security escalation misses, AI triage needs a human review layer rather than running fully autonomously (GitHub Agentic Workflows technical preview).

Service failures are also a real risk. AI services go down and rate limits apply. Triage pipelines should degrade gracefully. When the model is unavailable, fall back to an unlabeled open state rather than blocking issue creation.

Rollout sequencing

  1. Read-only first. Start with summarization only — no labels, no routing. Observe classification quality in comments before enabling writes.
  2. Label with review. Enable add-label safe outputs, but review label accuracy for one to two weeks. Adjust prompts based on misclassifications.
  3. Route to teams. Once labeling accuracy holds, add assignment rules that route labeled issues to the right team or individual, following the classify-then-route composition pattern.
  4. Close duplicates. This is the highest-risk operation. Enable it only after the classifier shows reliable duplicate detection.

Cost model

Copilot-powered triage workflows consume Copilot premium requests per run. Event-triggered workflows scale with issue volume. Scheduled batch workflows spread the cost across all accumulated issues (GitHub Blog).

Key Takeaways

  • Decompose triage into three independent operations (summarize, label, route) — each can be automated and validated separately
  • Safe outputs with volume limits and content sanitization enable continuous agent operation without risk of unbounded writes
  • Use the classify-then-route pattern to specialize prompts per issue category rather than one prompt for all types
  • Start read-only, prove accuracy, then progressively enable write operations
  • Batch scheduling reduces cost and context load compared to per-event triggers on high-volume repos

Example

A repository uses GitHub Agentic Workflows to triage every new issue through a three-stage pipeline: summarize, label, and route.

Workflow file (.github/workflows/triage.md):

on:
  issues:
    types: [opened]
permissions:
  contents: read
  issues: write
  models: read
safe-outputs:
  - add-label:
      allowed-labels: [bug, feature, docs, duplicate, needs-info]
  - add-comment:
      max-count: 1
  - add-assignee:
      max-count: 1

Prompt instructions (.github/prompts/triage-classify.prompt.md):

Classify this issue into exactly one category:

- **bug** — the user reports broken behavior with reproduction steps or error output
- **feature** — the user requests new functionality or a change to existing behavior
- **docs** — the issue concerns documentation: typos, missing pages, or unclear instructions
- **duplicate** — the issue restates a problem already tracked in an open issue
- **needs-info** — the issue lacks enough detail to classify; request reproduction steps or expected behavior

Result: when a new issue opens, the workflow runs the classifier prompt against the issue body, applies the matching label, posts a structured summary comment, and assigns the issue to the team mapped to that label. The whole pipeline runs within the safe-output constraints — at most one label, one comment, and one assignee per run.

Feedback