Tool Description Quality¶

Tool descriptions — not just tool implementations — determine whether agents select the right tool for a task. Treating descriptions as prompt engineering surfaces is a direct multiplier on task success rate.

Learn it hands-on with The Right Call, Obvious, a guided lesson with quizzes.

Also known as

Tool Selection Guidance, Selection Signals

Selection as a reasoning step¶

Agents do not browse a tool catalog before acting. They select a tool by reasoning about which one best matches their current intent. A poorly described tool is invisible for the use cases its description fails to communicate, even when the implementation would handle them correctly.

Anthropic's multi-agent research system post reports that improving tool ergonomics, including descriptions, cut task completion time by 40% for agents using the updated tools.

Here is the mechanism. Tool descriptions sit in the agent's context at the reasoning step. Richer, more distinctive descriptions create stronger semantic signals that match agent intent to the correct tool. Research on tool-level retrieval for multi-agent systems confirms this: coarse descriptions cluster functionally different tools together in embedding space, which makes correct selection unreliable.

Instruct agents to examine tools first¶

When a tool set includes both generic and specialized tools, agents tend to match on the first plausible tool, often a generic one. Make the preference explicit in the system prompt to counter this: "Before acting, review your available tools and select the one that best matches the task. Prefer specialized over generic tools." An agent that defaults to a generic search tool when a specialized one is available produces lower-quality results.

MCP server tool descriptions¶

MCP servers expose many tools at once. Unclear descriptions at this scale cause systematic misuse: every agent makes the same wrong selection, and the error compounds across all invocations. For MCP tools:

Write each description so it stands on its own, because agents may not have context from adjacent tools
Do not assume agents read related tools before selecting the current one
Include domain context in each description, not just in a top-level server description

Testing tool selection¶

Tool selection failures are often invisible during development. An agent that calls the wrong tool and returns a plausible-looking result hides the error until you compare it against ground truth. To test selection:

Instrument agent traces and log which tool was selected for each task type
Compare the selected tools against ground truth for a representative set of test cases
Refine descriptions from the misselection patterns you observe, not from intuition about what descriptions should say

Iterating on descriptions¶

Description iteration follows the same pattern as prompt iteration: observe, identify failures, change, measure. The most common failure mode is a description accurate enough to say what the tool does but not specific enough to tell the agent when to prefer it over alternatives.

The fix is positive selection signals: "Use this tool when X" and "Prefer this over [other tool] when Y." These are instructions to the agent, not documentation of the interface.

Example¶

The following pair shows the same MCP tool with a weak description and an improved one. The weak version is accurate but leaves selection decisions to the agent.

# Before: accurate but minimal — agent must guess when to use it
{
    "name": "search_issues",
    "description": "Search for issues in the project tracker.",
    "inputSchema": {
        "type": "object",
        "properties": {
            "query": {"type": "string"}
        },
        "required": ["query"]
    }
}

# After: includes query syntax, return shape, and when to prefer it over alternatives
{
    "name": "search_issues",
    "description": (
        "Search for issues in the project tracker. "
        "Returns a list of issues matching the query, each with id, title, status, and assignee. "
        "Supports field filters: status:open, status:closed, assignee:<username>, label:<name>. "
        "Use this tool to find issues by keyword or filter. "
        "Prefer this over list_issues when you have a search term or filter criteria. "
        "Use list_issues instead when you need all issues in a project without filtering."
    ),
    "inputSchema": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "Search keywords and/or field filters. Example: 'login page status:open assignee:alice'"
            }
        },
        "required": ["query"]
    }
}

The improved description answers all three questions the page identifies: what the tool does, what it returns, and when to use it instead of list_issues. The query syntax example removes trial-and-error on filter format.

When this backfires¶

Each description adds tokens on every invocation. This matters in three cases:

Large MCP servers (50+ tools): verbose descriptions push tool context above 10k tokens. Use retrieval-based selection (embedding search to pick a subset) instead of in-context enumeration.
High-frequency loops: verbose descriptions add cost with diminishing returns once selection stabilizes.
Genuinely similar tools: description quality cannot resolve near-identical tools. Consolidate or differentiate them at the implementation level. See Consolidate Agent Tools.

Key Takeaways¶

Tool description quality is a direct performance lever — improving tool ergonomics (including descriptions) reduced task completion time by 40% in one case
Prompt agents explicitly to prefer specialized over generic tools; make this instruction explicit in the system prompt
MCP server tools require self-contained descriptions; do not assume agents read adjacent tool docs
Test tool selection explicitly by logging which tools are selected for which tasks
Add positive selection signals ("use this when...") not just capability descriptions
At large tool set sizes (50+ tools), prefer retrieval-based selection over in-context enumeration to manage token cost