Step Limits for Agents (Stop Loops Early) + Code

Interactive flow

Scenario:

Step 1/3: Execution

Action is proposed as structured data (tool + args).

Problem-first intro

You ask the agent one question.

It answers… eventually.

Except sometimes it doesn’t. It just keeps going:

plan
search
fetch
re-plan
search again

The UI spins. Users retry. Now you’ve got two loops.

We’ve watched this turn one flaky request into a 700-call mess. Step limits stop the bleeding fast.

Step limits are the simplest governance control because they’re the closest thing you have to a circuit breaker for the agent’s brain: “you get N chances, then you stop.”

Why this fails in production

1) “Step” is fuzzy unless you define it

In many agent frameworks, one “step” can do:

one model call
plus multiple tool calls
plus retries

If you only cap steps, you still get tool spam. Define and cap both:

max steps
max tool calls

2) The model treats uncertainty as “try again”

When outputs are noisy (search, web, flaky APIs) the model doesn’t know it’s burning money. It sees uncertainty and does what we trained it to do: keep trying.

3) Repeat actions are the real loop signal

The most useful loop detector we’ve shipped isn’t fancy. It’s: “same tool + same args repeated N times”.

That catches:

search thrashing
auth retry loops
“fetch the same page again”

4) Step limits need a good stop reason

If you stop without telling the user why, you’ll train them to retry. Stop reasons are part of the product.

Implementation example (real code)

This guard does two things:

caps total steps
stops on repeated actions (same tool+args key)

PythonJS

PYTHON

import hashlib
import json
from dataclasses import dataclass
from typing import Any


def action_key(name: str, args: dict[str, Any]) -> str:
  raw = json.dumps({"name": name, "args": args}, sort_keys=True).encode("utf-8")
  return hashlib.sha256(raw).hexdigest()


class LoopDetected(RuntimeError):
  pass


class StepLimitExceeded(RuntimeError):
  pass


@dataclass
class StepPolicy:
  max_steps: int = 25
  max_repeat: int = 3


class StepGuard:
  def __init__(self, policy: StepPolicy):
      self.policy = policy
      self.steps = 0
      self.seen: dict[str, int] = {}

  def on_action(self, *, name: str, args: dict[str, Any]) -> None:
      self.steps += 1
      if self.steps > self.policy.max_steps:
          raise StepLimitExceeded("max_steps")

      key = action_key(name, args)
      self.seen[key] = self.seen.get(key, 0) + 1
      if self.seen[key] >= self.policy.max_repeat:
          raise LoopDetected(f"repeat_action>= {self.policy.max_repeat}")


def agent_loop(task: str, *, guard: StepGuard, tools) -> str:
  while True:
      action = decide(task)  # (pseudo) -> { kind, name, args }
      if action.kind == "final":
          return action.answer

      guard.on_action(name=action.name, args=action.args)
      obs = tools.call(action.name, action.args)
      task = update(task, action, obs)  # (pseudo)

JAVASCRIPT

import crypto from "node:crypto";

export class LoopDetected extends Error {}
export class StepLimitExceeded extends Error {}

export function actionKey(name, args) {
const raw = JSON.stringify({ name, args });
return crypto.createHash("sha256").update(raw, "utf8").digest("hex");
}

export class StepGuard {
constructor({ maxSteps = 25, maxRepeat = 3 } = {}) {
  this.maxSteps = maxSteps;
  this.maxRepeat = maxRepeat;
  this.steps = 0;
  this.seen = new Map(); // key -> count
}

onAction({ name, args }) {
  this.steps += 1;
  if (this.steps > this.maxSteps) throw new StepLimitExceeded("max_steps");

  const key = actionKey(name, args);
  const n = (this.seen.get(key) || 0) + 1;
  this.seen.set(key, n);
  if (n >= this.maxRepeat) throw new LoopDetected("repeat_action>= " + this.maxRepeat);
}
}

Real failure case (incident-style, with numbers)

We had an agent that “investigated an issue” by searching + fetching docs.

A vendor search outage made results unstable. The agent responded by re-searching, because it never hit a “good enough” condition.

Impact:

one run hit 124 steps
user retried twice (so three runs)
total tool calls: ~90 (mostly search)
spend: ~$38 for one user request

Fix:

step caps + repeat detection (same tool+args)
stop reason surfaced in UI (“loop detected” vs “budget hit”)
safe-mode output: “search is unstable; here’s what I can do without it”

We didn’t need a better prompt. We needed brakes.

Trade-offs

Repeat detection can stop legitimate “try again” behavior (tune max_repeat).
Step caps reduce completion on genuinely hard tasks (use tiers / approvals).
If you don’t also cap tool calls, step caps can still be expensive.

When NOT to use

If you’re running async batch jobs, use time + cost budgets instead of tight step caps.
If your tool outputs are deterministic and your plan is fixed, you may not need repeat detection (still keep max steps).
If you can’t return stop reasons, users will retry and you’ll pay twice.

Copy-paste checklist

[ ] Define “step” (what increments it?)
[ ] Cap max steps per run
[ ] Cap max tool calls per run (separate from steps)
[ ] Add repeated-action detection (tool+args key)
[ ] Return explicit stop reasons to the UI
[ ] Alert on spikes: steps/run, loop_detected/run

Safe default config snippet (JSON/YAML)

YAML

steps:
  max_steps: 25
  loop_detection:
    enabled: true
    max_repeat_action: 3
tools:
  max_tool_calls: 12
stop_reasons:
  return_to_user: true

FAQ (3–5)

Used by patterns

Related failures

Governance required

What’s a good default step limit?

Start with 25 for synchronous runs. If you need more, move the run async or require approval for a higher tier.

Why not rely on the model to stop?

Because the model is optimized to keep trying. ‘Try again’ looks like progress even when it’s just spend.

Is loop detection required?

If you use search/web tools, yes. Repeat-action detection is a cheap way to catch thrash early.

What should users see when we stop?

A stop reason and partial results. Silence trains retries.

Q: What’s a good default step limit?
A: Start with 25 for synchronous runs. If you need more, move the run async or require approval for a higher tier.

Q: Why not rely on the model to stop?
A: Because the model is optimized to keep trying. “Try again” looks like progress even when it’s just spend.

Q: Is loop detection required?
A: If you use search/web tools, yes. Repeat-action detection is a cheap way to catch thrash early.

Q: What should users see when we stop?
A: A stop reason and partial results. Silence trains retries.

Foundations: Planning vs reactive agents · Why agents fail in production
Failure: Infinite loop · Tool spam loops
Governance: Budget controls · Kill switch design
Production stack: Production agent stack