Single-Step Agents (Anti-Pattern) + Fixes + Code

  • Recognize the trap before it ships to prod.
  • See what breaks when the model is confidently wrong.
  • Copy safer defaults: permissions, budgets, idempotency.
  • Know when you shouldn’t use an agent at all.
Detection signals
  • Tool calls per run spikes (or repeats with same args hash).
  • Spend or tokens per request climbs without better outputs.
  • Retries shift from rare to constant (429/5xx).
A 'single-step agent' is usually a chat completion glued to side effects. Why it breaks in production, and what a minimal production loop looks like instead.
On this page
  1. Problem-first intro
  2. Why this fails in production
  3. 1) No feedback loop = no recovery
  4. 2) Budgets and stop reasons get bolted on too late
  5. 3) Tool output gets ignored or misused
  6. 4) Writes become a coin flip
  7. When single-step is enough (yes, sometimes)
  8. Hard routing rule (the one that saves you)
  9. Migration path (single-step → loop)
  10. Implementation example (real code)
  11. Failure evidence (what it looks like when it breaks)
  12. Example failure case (composite)
  13. 🚨 Incident: Premature ticket closure
  14. Trade-offs
  15. When NOT to use
  16. Copy-paste checklist
  17. Safe default config
  18. FAQ
  19. Related pages
  20. Production takeaway
  21. What breaks without this
  22. What works with this
  23. Minimum to ship
Quick take

Quick take: Single-step “agents” (one model call → execute → done) have no place for validation, no recovery loop, and no stop reasons. They fail because production systems are noisy. If you have tools or side effects, you need a bounded loop + governance.

You'll learn: When single-step is actually fine • The minimal safe routing rule • A bounded loop interface • Stop reasons • A real incident smell test

Concrete metric

Single-step: validation has nowhere to live • recovery happens in “clever prompts” • writes happen too early
Looped runner: budgets • tool gateway • stop reasons • safe-mode
Impact: fewer incidents + debuggable failures instead of “execute & pray”


Problem-first intro

Somebody says: “we built an agent”.

The code is:

  1. Call the model once
  2. Parse a tool call
  3. Execute it
  4. Return whatever happened
Truth

That’s not an agent. That’s a function call with unpredictable arguments.

In a demo it feels fast. In production it fails for the reason you built agents in the first place: real systems are noisy, and you need feedback + control.


Why this fails in production

Failure analysis

1) No feedback loop = no recovery

Production is full of timeouts, partial responses, 429s, stale data, and schema drift. A single-step design has nowhere to put recovery logic, so teams push “recovery” into prompts and then execute it blindly.

2) Budgets and stop reasons get bolted on too late

Teams say: “it can’t loop, so we don’t need budgets.”

Then they add retries in tools, retries in the model call, and a second tool call “just in case”.

Truth

Congrats, you reinvented loops without governance.

3) Tool output gets ignored or misused

If you only call a tool once, what do you do with the output? Usually you just return it. That means no validation, no invariants, and no “did we actually solve the task?” check.

4) Writes become a coin flip

In a single-step design, the model can propose a write immediately. There’s no “read first, write later” policy. The blast radius arrives early.


When single-step is enough (yes, sometimes)

Single-step is fine when all of this is true:

  • No tools (or tools are strictly read-only)
  • No side effects (no state changes)
  • Output is used as text, not as a command
  • You can validate output with a strict schema (or you don’t need to)

Decision framework: single-step is OK only if all are true:

  • ✅ Read-only (no side effects)
  • ✅ Strongly typed output (or no tools)
  • ✅ Failure is cheap (low blast radius)
  • ✅ No retries/recovery loop needed

If any of those are false, route to a looped runner.


Hard routing rule (the one that saves you)

If the next step can cause side effects, a single-step path is not allowed.

TEXT
if action.has_side_effects:
  run_looped_runner()
else:
  run_single_step()

This sounds obvious. It’s not obvious when the demo is working and nobody has been paged yet.


Migration path (single-step → loop)

This is what teams usually ship, and why it breaks:

PYTHON
# v1: single-step (fast, unsafe)
result = tool(llm_decide(task))  # damage can happen before validation

# v2: add validation (still unsafe if the tool already ran)
result = tool(llm_decide(task))
if not valid(result):
    raise RuntimeError("too late: side effect already happened")

# v3: bounded loop (safe enough to operate)
for step in range(max_steps):
    action = llm_decide(state)
    if action.kind == "tool":
        obs = tool_gateway.call(action.name, action.args)  # policy + budgets
        state = update(state, obs)
    else:
        return action.final_answer

Implementation example (real code)

This pattern keeps single-step where it belongs (safe, read-only), and routes everything else to a bounded loop runner.

PYTHON
from __future__ import annotations

from dataclasses import dataclass
from typing import Any, Literal


@dataclass(frozen=True)
class Budgets:
  max_steps: int = 25
  max_tool_calls: int = 12
  max_seconds: int = 60


class Stopped(RuntimeError):
  def __init__(self, stop_reason: str):
      super().__init__(stop_reason)
      self.stop_reason = stop_reason


def is_side_effecting(action: dict[str, Any]) -> bool:
  # Production: decide side-effect class in code, not by prompt vibes.
  return action.get("kind") in {"write", "payment", "email", "ticket_close"}


def run_single_step(task: str, *, llm) -> dict[str, Any]:
  """
  Safe single-step: no tools, no writes.
  This is a completion, not an agent.
  """
  text = llm.text({"task": task, "style": "direct"})  # (pseudo)
  return {"status": "ok", "stop_reason": "single_step", "answer": text}


def run_looped(task: str, *, budgets: Budgets, runner) -> dict[str, Any]:
  """
  Delegate to a bounded runner that has:
  - tool gateway
  - output validation
  - stop reasons
  """
  return runner.run(task, budgets=budgets)  # (pseudo)


def route(task: str, *, llm, budgets: Budgets, runner) -> dict[str, Any]:
  # First decision is read-only: are we about to do anything with side effects?
  action = llm.json(
      {
          "task": task,
          "rule": "Return JSON {kind: 'read_only'|'side_effects'} and nothing else.",
          "examples": [{"task": "Summarize this text", "kind": "read_only"}, {"task": "Close ticket #123", "kind": "side_effects"}],
      }
  )  # (pseudo)

  if action.get("kind") == "side_effects":
      return run_looped(task, budgets=budgets, runner=runner)

  return run_single_step(task, llm=llm)
JAVASCRIPT
export class Stopped extends Error {
constructor(stopReason) {
  super(stopReason);
  this.stop_reason = stopReason;
}
}

export function runSingleStep(task, { llm }) {
// Safe single-step: no tools, no writes.
return llm.text({ task, style: "direct" }).then((text) => ({ status: "ok", stop_reason: "single_step", answer: text })); // (pseudo)
}

export function runLooped(task, { budgets, runner }) {
// Delegate to a bounded runner with tool gateway + stop reasons.
return runner.run(task, { budgets }); // (pseudo)
}

export async function route(task, { llm, budgets, runner }) {
const action = await llm.json({
  task,
  rule: "Return JSON {kind: 'read_only'|'side_effects'} and nothing else.",
  examples: [
    { task: "Summarize this text", kind: "read_only" },
    { task: "Close ticket #123", kind: "side_effects" },
  ],
}); // (pseudo)

if (action.kind === "side_effects") return await runLooped(task, { budgets, runner });
return await runSingleStep(task, { llm });
}
Note

This doesn’t look “agentic”. It looks operable. That’s the point.


Failure evidence (what it looks like when it breaks)

Single-step failures show up as “one bad decision with immediate blast radius”.

A trace that explains the incident in 5 lines:

JSON
{"run_id":"run_44a1","step":0,"event":"tool_call","tool":"ticket.close","args_hash":"b5d0aa","decision":"allow"}
{"run_id":"run_44a1","step":0,"event":"tool_result","tool":"ticket.close","ok":true}
{"run_id":"run_44a1","step":0,"event":"stop","reason":"success","note":"single-step"}

If that makes you uncomfortable, good.


Example failure case (composite)

Incident

🚨 Incident: Premature ticket closure

System: Single-step “close resolved tickets” agent
Duration: under 1 hour
Impact: 18 tickets incorrectly closed


What happened

The agent called ticket.close immediately based on a snippet. It misread sarcasm as “resolved”.

The worst part: nobody could explain why. There was no loop state, no stop reasons that mattered, and no opportunity to validate.


Fix

  1. Route side-effecting actions to a looped runner
  2. Tool gateway policy + audit logs
  3. Approvals for ticket.close

Trade-offs

Trade-offs
  • A loop is more code than one model call.
  • More steps can mean more latency (budgets help).
  • You need observability (but you needed it anyway).

When NOT to use

Don’t
  • If you truly have one deterministic transform, don’t call it an agent.
  • If your task needs tool feedback and recovery, single-step will be fragile.
  • If you can’t log traces and stop reasons, fix observability first.

Copy-paste checklist

Production checklist
  • [ ] If you have side effects, you need a looped runner
  • [ ] Route side-effecting tasks away from single-step
  • [ ] Add budgets (steps, tool calls, seconds)
  • [ ] Use a tool gateway (default-deny allowlist)
  • [ ] Validate tool outputs before acting
  • [ ] Return stop reasons (and log them)
  • [ ] Require approvals for writes

Safe default config

YAML
routing:
  allow_single_step_only_when: "read_only"
budgets:
  max_steps: 25
  max_tool_calls: 12
  max_seconds: 60
tools:
  allow: ["search.read", "kb.read", "http.get"]
writes:
  require_approval: true
stop_reasons:
  return_to_user: true

FAQ

FAQ
Are single-step agents ever okay?
Yes — when there are no tools and no side effects. At that point it’s a completion, not an agent.
Isn’t a loop slower?
It can be. Roughly: single-step is ~1 LLM call + 1 tool call; a 3-step loop is ~3 LLM calls + 2 tool calls. That’s often ~3× latency. Budgets cap the worst case — and speed doesn’t matter if it’s wrong or unoperable.
What’s the minimum governance for a loop?
Step limits, tool-call budgets, default-deny tool policy, and stop reasons.
Where do I get a good loop pattern?
Start with a bounded ReAct-style runner and a tool gateway. Don’t invent your own loop without budgets and traces.

Related

Production takeaway

Production takeaway

What breaks without this

  • ❌ Writes happen before validation
  • ❌ “Recovery” lives in prompts and tool retries
  • ❌ No stop reasons that explain behavior

What works with this

  • ✅ Side effects route to a bounded runner
  • ✅ Budgets + tool gateway keep runs controllable
  • ✅ Failures are explainable (stop reasons + traces)

Minimum to ship

  1. Routing rule (read-only can be single-step; side effects can’t)
  2. Bounded runner (budgets + stop reasons)
  3. Tool gateway (deny by default)
  4. Validation layer (before writes)

Not sure this is your use case?

Design your agent ->
⏱️ 8 min readUpdated Mar, 2026Difficulty: ★★★
Implement in OnceOnly
Safe defaults for tool permissions + write gating.
Use in OnceOnly
# onceonly guardrails (concept)
version: 1
tools:
  default_mode: read_only
  allowlist:
    - search.read
    - kb.read
    - http.get
writes:
  enabled: false
  require_approval: true
  idempotency: true
controls:
  kill_switch: { enabled: true, mode: disable_writes }
audit:
  enabled: true
Integrated: production controlOnceOnly
Add guardrails to tool-calling agents
Ship this pattern with governance:
  • Budgets (steps / spend caps)
  • Tool permissions (allowlist / blocklist)
  • Kill switch & incident stop
  • Idempotency & dedupe
  • Audit logs & traceability
Integrated mention: OnceOnly is a control layer for production agent systems.
Author

This documentation is curated and maintained by engineers who ship AI agents in production.

The content is AI-assisted, with human editorial responsibility for accuracy, clarity, and production relevance.

Patterns and recommendations are grounded in post-mortems, failure modes, and operational incidents in deployed systems, including during the development and operation of governance infrastructure for agents at OnceOnly.