Step Limits for Agents (Stop Loops Early) + Code

If your agent has no step limit, it’s a background process with feelings. Step limits, repeat-detection, and stop reasons that prevent infinite ‘one more try’ runs.
On this page
  1. Problem-first intro
  2. Why this fails in production
  3. 1) “Step” is fuzzy unless you define it
  4. 2) The model treats uncertainty as “try again”
  5. 3) Repeat actions are the real loop signal
  6. 4) Step limits need a good stop reason
  7. Implementation example (real code)
  8. Real failure case (incident-style, with numbers)
  9. Trade-offs
  10. When NOT to use
  11. Copy-paste checklist
  12. Safe default config snippet (JSON/YAML)
  13. FAQ (3–5)
  14. Related pages (3–6 links)
Interactive flow
Scenario:
Step 1/3: Execution

Action is proposed as structured data (tool + args).

Problem-first intro

You ask the agent one question.

It answers… eventually.

Except sometimes it doesn’t. It just keeps going:

  • plan
  • search
  • fetch
  • re-plan
  • search again

The UI spins. Users retry. Now you’ve got two loops.

We’ve watched this turn one flaky request into a 700-call mess. Step limits stop the bleeding fast.

Step limits are the simplest governance control because they’re the closest thing you have to a circuit breaker for the agent’s brain: “you get N chances, then you stop.”

Why this fails in production

1) “Step” is fuzzy unless you define it

In many agent frameworks, one “step” can do:

  • one model call
  • plus multiple tool calls
  • plus retries

If you only cap steps, you still get tool spam. Define and cap both:

  • max steps
  • max tool calls

2) The model treats uncertainty as “try again”

When outputs are noisy (search, web, flaky APIs) the model doesn’t know it’s burning money. It sees uncertainty and does what we trained it to do: keep trying.

3) Repeat actions are the real loop signal

The most useful loop detector we’ve shipped isn’t fancy. It’s: “same tool + same args repeated N times”.

That catches:

  • search thrashing
  • auth retry loops
  • “fetch the same page again”

4) Step limits need a good stop reason

If you stop without telling the user why, you’ll train them to retry. Stop reasons are part of the product.

Implementation example (real code)

This guard does two things:

  1. caps total steps
  2. stops on repeated actions (same tool+args key)
PYTHON
import hashlib
import json
from dataclasses import dataclass
from typing import Any


def action_key(name: str, args: dict[str, Any]) -> str:
  raw = json.dumps({"name": name, "args": args}, sort_keys=True).encode("utf-8")
  return hashlib.sha256(raw).hexdigest()


class LoopDetected(RuntimeError):
  pass


class StepLimitExceeded(RuntimeError):
  pass


@dataclass
class StepPolicy:
  max_steps: int = 25
  max_repeat: int = 3


class StepGuard:
  def __init__(self, policy: StepPolicy):
      self.policy = policy
      self.steps = 0
      self.seen: dict[str, int] = {}

  def on_action(self, *, name: str, args: dict[str, Any]) -> None:
      self.steps += 1
      if self.steps > self.policy.max_steps:
          raise StepLimitExceeded("max_steps")

      key = action_key(name, args)
      self.seen[key] = self.seen.get(key, 0) + 1
      if self.seen[key] >= self.policy.max_repeat:
          raise LoopDetected(f"repeat_action>= {self.policy.max_repeat}")


def agent_loop(task: str, *, guard: StepGuard, tools) -> str:
  while True:
      action = decide(task)  # (pseudo) -> { kind, name, args }
      if action.kind == "final":
          return action.answer

      guard.on_action(name=action.name, args=action.args)
      obs = tools.call(action.name, action.args)
      task = update(task, action, obs)  # (pseudo)
JAVASCRIPT
import crypto from "node:crypto";

export class LoopDetected extends Error {}
export class StepLimitExceeded extends Error {}

export function actionKey(name, args) {
const raw = JSON.stringify({ name, args });
return crypto.createHash("sha256").update(raw, "utf8").digest("hex");
}

export class StepGuard {
constructor({ maxSteps = 25, maxRepeat = 3 } = {}) {
  this.maxSteps = maxSteps;
  this.maxRepeat = maxRepeat;
  this.steps = 0;
  this.seen = new Map(); // key -> count
}

onAction({ name, args }) {
  this.steps += 1;
  if (this.steps > this.maxSteps) throw new StepLimitExceeded("max_steps");

  const key = actionKey(name, args);
  const n = (this.seen.get(key) || 0) + 1;
  this.seen.set(key, n);
  if (n >= this.maxRepeat) throw new LoopDetected("repeat_action>= " + this.maxRepeat);
}
}

Real failure case (incident-style, with numbers)

We had an agent that “investigated an issue” by searching + fetching docs.

A vendor search outage made results unstable. The agent responded by re-searching, because it never hit a “good enough” condition.

Impact:

  • one run hit 124 steps
  • user retried twice (so three runs)
  • total tool calls: ~90 (mostly search)
  • spend: ~$38 for one user request

Fix:

  1. step caps + repeat detection (same tool+args)
  2. stop reason surfaced in UI (“loop detected” vs “budget hit”)
  3. safe-mode output: “search is unstable; here’s what I can do without it”

We didn’t need a better prompt. We needed brakes.

Trade-offs

  • Repeat detection can stop legitimate “try again” behavior (tune max_repeat).
  • Step caps reduce completion on genuinely hard tasks (use tiers / approvals).
  • If you don’t also cap tool calls, step caps can still be expensive.

When NOT to use

  • If you’re running async batch jobs, use time + cost budgets instead of tight step caps.
  • If your tool outputs are deterministic and your plan is fixed, you may not need repeat detection (still keep max steps).
  • If you can’t return stop reasons, users will retry and you’ll pay twice.

Copy-paste checklist

  • [ ] Define “step” (what increments it?)
  • [ ] Cap max steps per run
  • [ ] Cap max tool calls per run (separate from steps)
  • [ ] Add repeated-action detection (tool+args key)
  • [ ] Return explicit stop reasons to the UI
  • [ ] Alert on spikes: steps/run, loop_detected/run

Safe default config snippet (JSON/YAML)

YAML
steps:
  max_steps: 25
  loop_detection:
    enabled: true
    max_repeat_action: 3
tools:
  max_tool_calls: 12
stop_reasons:
  return_to_user: true

FAQ (3–5)

What’s a good default step limit?
Start with 25 for synchronous runs. If you need more, move the run async or require approval for a higher tier.
Why not rely on the model to stop?
Because the model is optimized to keep trying. ‘Try again’ looks like progress even when it’s just spend.
Is loop detection required?
If you use search/web tools, yes. Repeat-action detection is a cheap way to catch thrash early.
What should users see when we stop?
A stop reason and partial results. Silence trains retries.

Q: What’s a good default step limit?
A: Start with 25 for synchronous runs. If you need more, move the run async or require approval for a higher tier.

Q: Why not rely on the model to stop?
A: Because the model is optimized to keep trying. “Try again” looks like progress even when it’s just spend.

Q: Is loop detection required?
A: If you use search/web tools, yes. Repeat-action detection is a cheap way to catch thrash early.

Q: What should users see when we stop?
A: A stop reason and partial results. Silence trains retries.

Not sure this is your use case?

Design your agent ->
⏱️ 6 min readUpdated Mar, 2026Difficulty: ★★★
Implement in OnceOnly
Budgets + permissions you can enforce at the boundary.
Use in OnceOnly
# onceonly guardrails (concept)
version: 1
budgets:
  max_steps: 25
  max_tool_calls: 12
  max_seconds: 60
  max_usd: 1.00
policy:
  tool_allowlist:
    - search.read
    - http.get
writes:
  require_approval: true
  idempotency: true
controls:
  kill_switch: { enabled: true }
Integrated: production controlOnceOnly
Add guardrails to tool-calling agents
Ship this pattern with governance:
  • Budgets (steps / spend caps)
  • Tool permissions (allowlist / blocklist)
  • Kill switch & incident stop
  • Idempotency & dedupe
  • Audit logs & traceability
Integrated mention: OnceOnly is a control layer for production agent systems.
Author

This documentation is curated and maintained by engineers who ship AI agents in production.

The content is AI-assisted, with human editorial responsibility for accuracy, clarity, and production relevance.

Patterns and recommendations are grounded in post-mortems, failure modes, and operational incidents in deployed systems, including during the development and operation of governance infrastructure for agents at OnceOnly.