Skip to content

Consolidate Agent Tools

Prefer fewer, higher-level tools that match how agents reason about tasks over many narrow tools that mirror API endpoint boundaries.

Learn it hands-on with the Consolidation vs Sprawl guided lesson, which includes quizzes.

The problem with API-shaped tool sets

Developers building tool-augmented agents often mirror the underlying API: one tool per endpoint, one tool per operation. This produces a large tool set. Agents then chain several calls to finish a single logical action. Finding a calendar slot and booking it needs two separate tools, two decisions, and two chances for error.

Agents pick tools by reading their descriptions and matching intent to a tool. A large set of overlapping or fine-grained tools makes that choice ambiguous. The agent has to work out which combination of tools reaches the goal, instead of picking the one tool that matches its intent. Anthropic's writing tools for agents post reports that more tools do not improve outcomes. Careful selection beats abundance.

Consolidation principle

Each tool should map to a distinct, human-understandable sub-task. If two tools are always called together, they should be one tool. If a tool's output always feeds into another specific tool, merge them.

Example: Instead of list_calendar_events + create_calendar_event, define a single schedule_event tool that finds availability and books in one call. The agent expresses intent ("schedule a meeting") and the tool handles the mechanics.

Overlapping functions

Overlapping tool functions produce two failure modes:

  1. Redundant calls — the agent calls both tools when one would have been enough
  2. Wrong tool selection — the agent picks the less appropriate tool because the distinction is unclear

Remove overlap by giving each tool a responsibility no other tool shares. If you cannot say in one sentence what each tool does that no other tool does, the set still overlaps.

Namespace grouping

When you do need several related tools, group them under a common namespace prefix. For example: asana_search, asana_projects_search, asana_task_create. The prefix tells the agent that these tools work on the same system, and it reduces confusion when the agent picks between them.

This works better than flat naming (search, project_search, create_task), where the relationship between tools stays implicit.

Why it works

LLMs pick tools by reading their descriptions in the context window. When ten narrow tool descriptions compete for attention, the model has to reason about which subset reaches the goal. That is a multi-step inference problem stacked on top of the real task. Fewer, well-scoped tools turn the choice into a direct mapping: intent to tool, rather than intent to a combination of tools.

The benefit is not only ergonomic. LongFuncEval (2025) found that growing a tool catalog dropped accuracy by 7 to 85%, depending on the model, with a strong lost-in-the-middle effect (Liu et al., 2023): the correct tool gets harder to find among distractors. Consolidation removes the distractors at the source, rather than relying on the model to filter them.

Context window impact

Each tool definition uses context tokens. A large set of narrow tools spends those tokens on definitions the agent may never use in a given task. Consolidating tools cuts that context footprint in proportion: fewer tools leave more context for task data and reasoning (token-efficient tool design).

This matters most in long-running tasks, where context pressure builds up. A tool set with a small context footprint gives a performance advantage that shows up in long, multi-step workflows.

When not to consolidate

Consolidation has limits, and it backfires in some cases. Do not merge tools that:

  • Serve genuinely distinct sub-tasks that are not always done together — forcing the agent to call a merged tool when it needs only one sub-operation wastes tokens and hides intent
  • Have very different permission requirements — combining them grants excess access to every caller, whatever sub-task they need
  • Have output schemas so different that a merged interface makes no sense — the agent cannot reliably pattern-match on the response

Over-consolidation has its own drawbacks. A merged tool that handles too much becomes a black box. When it fails, the agent cannot tell which step failed. A merged find_and_book_flight that silently fails at the hold step looks identical to one that fails at confirmation. Narrow tools keep failure granularity. A merged find_and_book_flight trades that away for fewer calls.

Here is the test. Does the merged tool still map to a single, clear, human-understandable action? If it takes a paragraph to describe, it has been over-consolidated. If two sub-tasks are sometimes called together but not always, keep them separate and let the agent compose them.

Example

A travel-booking agent starts with five tools mirroring the REST API:

tools:
  - name: search_flights
    description: Search available flights by route and date
  - name: get_flight_details
    description: Get seat map and baggage policy for a flight
  - name: hold_flight
    description: Place a temporary hold on a flight
  - name: create_booking
    description: Book a held flight with passenger details
  - name: send_confirmation
    description: Email booking confirmation to the passenger

The agent must chain all five in the correct order, deciding at each step which tool comes next. In practice get_flight_details is always called after search_flights, hold_flight always precedes create_booking, and send_confirmation always follows create_booking.

After consolidation:

tools:
  - name: find_flights
    description: Search flights and return options with full details (seat map, baggage)
  - name: book_flight
    description: Hold, book, and send confirmation for a selected flight

Two tools, two clear intents. find_flights merges search and detail retrieval — they were always called together. book_flight merges the hold-book-confirm chain — the agent never holds without booking or books without confirming. The agent now selects between "find" and "book" instead of reasoning about a five-step pipeline.

Key Takeaways

  • More tools do not improve agent outcomes — fewer, well-scoped tools reduce selection ambiguity
  • Each tool should map to one distinct sub-task; tools always called together should be one tool
  • Eliminate overlapping functions to prevent redundant calls and wrong tool selection
  • Use namespace prefixes to group related tools and signal system relationships
  • Consolidation also reduces context window consumption — a structural performance benefit
Feedback