Action is proposed as structured data (tool + args).
Problem-first intro
You ask the agent one question.
It answers… eventually.
Except sometimes it doesn’t. It just keeps going:
- plan
- search
- fetch
- re-plan
- search again
The UI spins. Users retry. Now you’ve got two loops.
We’ve watched this turn one flaky request into a 700-call mess. Step limits stop the bleeding fast.
Step limits are the simplest governance control because they’re the closest thing you have to a circuit breaker for the agent’s brain: “you get N chances, then you stop.”
Why this fails in production
1) “Step” is fuzzy unless you define it
In many agent frameworks, one “step” can do:
- one model call
- plus multiple tool calls
- plus retries
If you only cap steps, you still get tool spam. Define and cap both:
- max steps
- max tool calls
2) The model treats uncertainty as “try again”
When outputs are noisy (search, web, flaky APIs) the model doesn’t know it’s burning money. It sees uncertainty and does what we trained it to do: keep trying.
3) Repeat actions are the real loop signal
The most useful loop detector we’ve shipped isn’t fancy. It’s: “same tool + same args repeated N times”.
That catches:
- search thrashing
- auth retry loops
- “fetch the same page again”
4) Step limits need a good stop reason
If you stop without telling the user why, you’ll train them to retry. Stop reasons are part of the product.
Implementation example (real code)
This guard does two things:
- caps total steps
- stops on repeated actions (same tool+args key)
import hashlib
import json
from dataclasses import dataclass
from typing import Any
def action_key(name: str, args: dict[str, Any]) -> str:
raw = json.dumps({"name": name, "args": args}, sort_keys=True).encode("utf-8")
return hashlib.sha256(raw).hexdigest()
class LoopDetected(RuntimeError):
pass
class StepLimitExceeded(RuntimeError):
pass
@dataclass
class StepPolicy:
max_steps: int = 25
max_repeat: int = 3
class StepGuard:
def __init__(self, policy: StepPolicy):
self.policy = policy
self.steps = 0
self.seen: dict[str, int] = {}
def on_action(self, *, name: str, args: dict[str, Any]) -> None:
self.steps += 1
if self.steps > self.policy.max_steps:
raise StepLimitExceeded("max_steps")
key = action_key(name, args)
self.seen[key] = self.seen.get(key, 0) + 1
if self.seen[key] >= self.policy.max_repeat:
raise LoopDetected(f"repeat_action>= {self.policy.max_repeat}")
def agent_loop(task: str, *, guard: StepGuard, tools) -> str:
while True:
action = decide(task) # (pseudo) -> { kind, name, args }
if action.kind == "final":
return action.answer
guard.on_action(name=action.name, args=action.args)
obs = tools.call(action.name, action.args)
task = update(task, action, obs) # (pseudo)import crypto from "node:crypto";
export class LoopDetected extends Error {}
export class StepLimitExceeded extends Error {}
export function actionKey(name, args) {
const raw = JSON.stringify({ name, args });
return crypto.createHash("sha256").update(raw, "utf8").digest("hex");
}
export class StepGuard {
constructor({ maxSteps = 25, maxRepeat = 3 } = {}) {
this.maxSteps = maxSteps;
this.maxRepeat = maxRepeat;
this.steps = 0;
this.seen = new Map(); // key -> count
}
onAction({ name, args }) {
this.steps += 1;
if (this.steps > this.maxSteps) throw new StepLimitExceeded("max_steps");
const key = actionKey(name, args);
const n = (this.seen.get(key) || 0) + 1;
this.seen.set(key, n);
if (n >= this.maxRepeat) throw new LoopDetected("repeat_action>= " + this.maxRepeat);
}
}Real failure case (incident-style, with numbers)
We had an agent that “investigated an issue” by searching + fetching docs.
A vendor search outage made results unstable. The agent responded by re-searching, because it never hit a “good enough” condition.
Impact:
- one run hit 124 steps
- user retried twice (so three runs)
- total tool calls: ~90 (mostly search)
- spend: ~$38 for one user request
Fix:
- step caps + repeat detection (same tool+args)
- stop reason surfaced in UI (“loop detected” vs “budget hit”)
- safe-mode output: “search is unstable; here’s what I can do without it”
We didn’t need a better prompt. We needed brakes.
Trade-offs
- Repeat detection can stop legitimate “try again” behavior (tune
max_repeat). - Step caps reduce completion on genuinely hard tasks (use tiers / approvals).
- If you don’t also cap tool calls, step caps can still be expensive.
When NOT to use
- If you’re running async batch jobs, use time + cost budgets instead of tight step caps.
- If your tool outputs are deterministic and your plan is fixed, you may not need repeat detection (still keep max steps).
- If you can’t return stop reasons, users will retry and you’ll pay twice.
Copy-paste checklist
- [ ] Define “step” (what increments it?)
- [ ] Cap max steps per run
- [ ] Cap max tool calls per run (separate from steps)
- [ ] Add repeated-action detection (tool+args key)
- [ ] Return explicit stop reasons to the UI
- [ ] Alert on spikes:
steps/run,loop_detected/run
Safe default config snippet (JSON/YAML)
steps:
max_steps: 25
loop_detection:
enabled: true
max_repeat_action: 3
tools:
max_tool_calls: 12
stop_reasons:
return_to_user: true
FAQ (3–5)
Used by patterns
Related failures
Q: What’s a good default step limit?
A: Start with 25 for synchronous runs. If you need more, move the run async or require approval for a higher tier.
Q: Why not rely on the model to stop?
A: Because the model is optimized to keep trying. “Try again” looks like progress even when it’s just spend.
Q: Is loop detection required?
A: If you use search/web tools, yes. Repeat-action detection is a cheap way to catch thrash early.
Q: What should users see when we stop?
A: A stop reason and partial results. Silence trains retries.
Related pages (3–6 links)
- Foundations: Planning vs reactive agents · Why agents fail in production
- Failure: Infinite loop · Tool spam loops
- Governance: Budget controls · Kill switch design
- Production stack: Production agent stack