In-Agent Task Prioritization¶
Prioritization is the agent's ranking of pending tasks by composite score — distinct from routing (who) and scheduling (when).
In-agent task prioritization is the decision an agent makes, every turn, about which pending item to advance next. It is structurally distinct from parsimonious agent routing (which worker handles a task) and from scheduling (when a task runs); prioritization is the agent's own next-action ranking over work it has already accepted. Antonio Gulli treats it as a first-class pattern in Agentic Design Patterns (Chapter 20).
When This Pattern Pays Off¶
The pattern earns its complexity under four conditions; outside them, FIFO is the correct answer.
| Condition | Why ranking pays back |
|---|---|
| Scarce attention, not scarce CPU | The per-turn attention budget is the bottleneck; reordering changes marginal value per turn |
| Long backlog or long sessions | Head-of-line blocking compounds — low-value items consume slots before high-value ones |
| Heterogeneous item value | A 10× spread in expected payoff makes ranking strictly dominate arrival order |
| Estimable signals | Urgency, dependency, or blast radius can be derived from state without guessing |
When the constraint is resource contention rather than attention, the answer is strict FIFO plus lane-based execution queueing. Block's agent-task-queue ships exactly that — "FIFO Queuing: Strict first-in-first-out ordering within each exact queue_name" — because it serialises expensive operations to keep one machine responsive, not to maximise per-turn payoff.
Ranking Signals¶
A composite score combines several dimensions so no one signal dominates:
- Urgency — deadline proximity or external state changes that age out the item.
- Economic value — expected payoff if completed (the per-task equivalent of economic-value signalling, which carries the same signal across agents).
- Dependency / unblocking — items that unblock other waiting work; ranking by transitive unblock count is what an autonomous backlog agent does instead of issue-number order.
- Blast radius — irreversibility or scope; some teams invert this signal and rank irreversible work last to keep options open.
- Staleness — items whose ground truth is decaying.
Why It Works¶
Per-turn attention is the scarce resource, and arrival order has no relationship to marginal value. Pure FIFO produces head-of-line blocking — a high-value program waits behind low-value calls. Autellix identifies this as the dominant inefficiency in LLM-agent workloads and reports 4–15× program-throughput gains at equivalent latency from program-aware priority scheduling. A composite score reorders the queue so each turn pays back more of the goal; the mechanism is attention-budget allocation under head-of-line blocking, not "agents need lists".
When This Backfires¶
Three failure modes recur:
- Starvation of low-priority tasks. Pure top-K priority leaves low-priority items waiting indefinitely; HEXGEN-FLOW documents this for agentic text-to-SQL and applies aging — promote an item's priority after it has waited past a threshold, borrowed from Solaris TS and MLFQ. Without aging, the queue eats its own tail.
- Thrashing from constant re-ranking. Recomputing scores every turn flips order under noise and the agent never finishes anything. Re-rank on state change, not on every turn; debounce.
- Gaming a single signal. When one dimension (self-declared urgency) is the only input, the system optimises that signal at the expense of throughput. Cap per-signal weight, or split high-stakes work into a separate lane rather than racing it through the main queue.
Example¶
A backlog agent processing the issue tracker via labels-as-locks defaults to issue-number order. Replace the default ordering with a composite score, computed once per scan, and re-evaluate on state change rather than per turn:
def score(issue):
return (
2.0 * unblock_count(issue) # transitive items waiting on this one
+ 1.5 * value_estimate(issue) # tag-derived expected payoff
+ 1.0 * urgency(issue) # deadline + staleness
+ 0.2 * waiting_turns(issue) # aging — anti-starvation boost
)
next_issue = max(ready_issues, key=score)
The waiting_turns term is the aging mitigation; without it, an issue with low value and zero dependents never runs. Capped per-signal weight (no signal exceeds ~2×) prevents the gaming failure mode. Re-rank when an item moves to ready or an upstream item completes — not every turn — to avoid thrashing.
Key Takeaways¶
- Prioritization is a different decision from routing (who) and scheduling (when) — name it as its own concern so teams can encode urgency, value, and dependency signals into the loop.
- The pattern earns its complexity when attention is scarce, the backlog is long, value spread is wide, and ranking signals are estimable. Otherwise FIFO plus lane isolation is correct — Block's agent-task-queue is the worked example of that choice.
- Composite scores beat single signals; aging beats pure priority; re-rank on state change, not per turn.
- Head-of-line blocking on the attention budget is the mechanism the pattern attacks; Autellix reports 4–15× program-throughput gains from program-aware priority over FIFO.