Normal path: execute → tool → observe.
Quick take
- Tool spam is usually a loop: same tool + same args, repeated until budgets burn.
- Put budgets and dedupe in the tool gateway, not in the prompt.
- Cache/dedupe by
(tool_name, canonical_args_hash)inside a run. - Make the stop reason user-visible (so they don’t mash refresh).
Problem-first intro
Your agent is “working”.
The logs say:
search.readcalled 47 timeshttp.getcalled 19 times- request still timed out
The user sees: nothing.
You see: a bill.
Tool spam is one of the most common “first production incidents” because it doesn’t look catastrophic. It looks like “the agent is trying hard”. In reality it’s usually a loop with prettier text.
Why this fails in production
Tool spam is almost never caused by one thing. It’s an ecosystem of small mistakes.
1) No tool-call budget (or only a step budget)
A step budget doesn’t help if one “step” can call 5 tools. You need both:
- max steps
- max tool calls
- max time
- max spend
2) The tool output is slightly nondeterministic
Search is nondeterministic. Web pages change. Time-based results reorder.
If the agent expects “same input → same output”, it will keep trying until it “feels confident”. Confidence isn’t a stop condition.
3) No dedupe window
If the agent calls the same tool with the same args, that’s not “thoroughness”. That’s a bug.
The fix is boring: cache tool calls by (tool_name, args_hash) within a run (or within a short window).
4) The agent has no “I already tried this” memory
Reactive loops need a scratchpad:
- “I searched for X”
- “I fetched Y”
- “This didn’t help because Z”
Without it, the agent re-discovers the same dead ends.
5) Retries multiply spam
If the tool has retries and the agent also retries by re-issuing the call, you get:
- tool retry storm
- plus agent loop
That’s how you melt rate limits.
Implementation example (real code)
This is a minimal “anti-spam” tool gateway:
- per-run tool-call budget
- per-tool dedupe window
- cheap caching by args hash
import hashlib
import json
import time
from dataclasses import dataclass
from typing import Any, Callable
def stable_hash(obj: Any) -> str:
raw = json.dumps(obj, sort_keys=True, ensure_ascii=False).encode("utf-8")
return hashlib.sha256(raw).hexdigest()
@dataclass
class ToolBudgets:
max_calls: int = 12
dedupe_window_s: int = 60
class ToolSpamDetected(RuntimeError):
pass
class ToolGateway:
def __init__(self, *, impls: dict[str, Callable[[dict[str, Any]], Any]], budgets: ToolBudgets):
self.impls = impls
self.budgets = budgets
self.calls = 0
self.cache: dict[str, tuple[float, Any]] = {}
def call(self, name: str, args: dict[str, Any]) -> Any:
self.calls += 1
if self.calls > self.budgets.max_calls:
raise ToolSpamDetected(f"tool budget exceeded (calls={self.calls})")
key = f"{name}:{stable_hash(args)}"
now = time.time()
hit = self.cache.get(key)
if hit:
ts, val = hit
if now - ts <= self.budgets.dedupe_window_s:
return val
fn = self.impls.get(name)
if not fn:
raise RuntimeError(f"unknown tool: {name}")
val = fn(args)
self.cache[key] = (now, val)
return valimport crypto from "node:crypto";
export class ToolSpamDetected extends Error {}
function canonicalize(value) {
if (Array.isArray(value)) return value.map(canonicalize);
if (!value || typeof value !== "object") return value;
if (value.constructor !== Object) return value; // best-effort; avoid reordering custom types
const out = {};
for (const k of Object.keys(value).sort()) out[k] = canonicalize(value[k]);
return out;
}
export function stableHash(obj) {
const raw = JSON.stringify(canonicalize(obj));
return crypto.createHash("sha256").update(raw).digest("hex");
}
export class ToolGateway {
constructor({ impls = {}, budgets = { maxCalls: 12, dedupeWindowS: 60 } } = {}) {
this.impls = impls;
this.budgets = budgets;
this.calls = 0;
this.cache = new Map(); // key -> { ts, val }
}
call(name, args) {
this.calls += 1;
if (this.calls > this.budgets.maxCalls) {
throw new ToolSpamDetected("tool budget exceeded (calls=" + this.calls + ")");
}
const key = name + ":" + stableHash(args);
const now = Date.now() / 1000;
const hit = this.cache.get(key);
if (hit && now - hit.ts <= this.budgets.dedupeWindowS) return hit.val;
const fn = this.impls[name];
if (!fn) throw new Error("unknown tool: " + name);
const val = fn(args);
this.cache.set(key, { ts: now, val });
return val;
}
}This doesn’t “solve agents”. It solves one boring thing: repeated calls with the same args don’t burn your budget.
You still need:
- loop detection in the agent loop
- stop reasons
- and a way to show partial results when budgets hit
Example incident (numbers are illustrative)
Example: a support agent that used search.read to find relevant KB pages.
During a vendor search outage, results became unstable (timeouts + partial responses). The agent interpreted that as “not enough confidence” and kept searching.
Impact (one morning):
- avg tool calls per run: 3 → 28
- rate limits triggered and degraded other services
- model + tool spend: +$310 that day (mostly wasted)
Fix:
- per-run tool-call budgets (hard stop)
- dedupe window keyed by tool+args
- safe-mode response: “I can’t search right now; here’s what I know without it”
- alerting on
tool_calls/runspikes
Tool spam isn’t “the model being curious”. It’s missing brakes.
Trade-offs
- Caching/dedupe can hide real changes (good for stability, bad for freshness).
- Budgets can cut off “almost done” runs (better than bankrupt runs).
- Safe-mode reduces answer quality, but improves reliability and cost control.
When NOT to use
- If freshness matters more than cost, don’t cache aggressively (use smaller windows).
- If a tool is deterministic and cheap, dedupe may be unnecessary (still keep budgets).
- If the task is deterministic, don’t use an agent at all. Use a workflow.
Copy-paste checklist
- [ ] Max tool calls per run
- [ ] Max time per run
- [ ] Dedupe window per (tool, args hash)
- [ ] Cache read tools (short TTL)
- [ ] Retry policy in one place (gateway), not in agent + tool
- [ ] Loop detection: repeated action keys stop the run
- [ ] Stop reasons: tool budget vs time budget vs loop detected
- [ ] Alert on spikes:
tool_calls/run,spend/run,latency/run
Safe default config snippet (JSON/YAML)
budgets:
max_steps: 25
max_tool_calls: 12
max_seconds: 60
tools:
dedupe_window_s: 60
cache_ttl_s: 30
retries:
max_attempts: 2
retryable_status: [408, 429, 500, 502, 503, 504]
FAQ (3–5)
Used by patterns
Related failures
Related pages (3–6 links)
- Foundations: Planning vs reactive agents · What makes an agent production-ready
- Failure: Budget explosion · Infinite loop
- Governance: Tool permissions (allowlists)
- Production stack: Production agent stack