Budget Controls for AI Agents (Steps, Time, $) + Code

If your agent can spend unlimited time and money, it will. A production budget policy that stops runs safely and returns stop reasons you can alert on.
On this page
  1. Problem-first intro
  2. Why this fails in production
  3. 1) Teams budget *one thing* and forget the rest
  4. 2) Retries are multiplicative in loops
  5. 3) Budgets without stop reasons are invisible
  6. 4) Budget enforcement scattered across the codebase doesn’t work
  7. Implementation example (real code)
  8. Real failure case (incident-style, with numbers)
  9. Trade-offs
  10. When NOT to use
  11. Copy-paste checklist
  12. Safe default config snippet (JSON/YAML)
  13. FAQ (3–5)
  14. Related pages (3–6 links)
Interactive flow
Scenario:
Step 1/3: Execution

Action is proposed as structured data (tool + args).

Problem-first intro

Your agent “works” in staging.

Then production traffic hits and you learn two things:

  1. the agent is a loop, and loops don’t stop out of kindness
  2. finance is not a monitoring system (but they will page you anyway)

We’ve seen the same pattern over and over:

  • a flaky tool adds retries
  • retries add tool calls
  • tool calls add more model tokens (“here’s what happened
 try again”)
  • and suddenly your “few cents” agent is doing $8–$20 per run

At scale, that’s not “a bug”. That’s a surprise subscription your CFO didn’t sign up for.

Budgets aren’t “cost optimization”. They’re safety controls. They decide what happens when the agent can’t finish.

If you don’t decide, the agent decides. And the agent’s decision is usually: “one more try”.

Why this fails in production

Budget failures are boring. That’s why they ship.

1) Teams budget one thing and forget the rest

Common mistake: “we have a token budget”.

Cool. Your agent just spent $0.04 on tokens and $6 on browser automation.

Production budgets need at least:

  • max_steps (control loop length)
  • max_seconds (wall clock time)
  • max_tool_calls (blast radius)
  • max_usd (the “nope” line)

2) Retries are multiplicative in loops

One retry isn’t the problem. Retries inside an agent loop (plus tool retries) are a cost multiplier.

3) Budgets without stop reasons are invisible

If the run ends with a timeout, users retry. That creates more runs.

You want explicit stop reasons:

  • max_seconds
  • max_tool_calls
  • max_usd
  • loop_detected

Stop reasons are observability.

4) Budget enforcement scattered across the codebase doesn’t work

If budgets are checked:

  • sometimes in the agent
  • sometimes in the tool wrapper
  • sometimes not at all


you will miss a path.

Put budgets in one choke point: the run loop + tool gateway.

Implementation example (real code)

This is a production-shaped budget guard:

  • checks budgets continuously (not “at the end”)
  • tracks model + tool cost (approx is fine)
  • throws a typed stop reason you can log + alert on
PYTHON
from dataclasses import dataclass, field
import time
from typing import Any


TOOL_USD = {
  "search.read": 0.00,
  "http.get": 0.00,
  "browser.run": 0.20,  # placeholder
}


@dataclass(frozen=True)
class BudgetPolicy:
  max_steps: int = 25
  max_seconds: int = 60
  max_tool_calls: int = 12
  max_usd: float = 1.00


@dataclass
class BudgetState:
  started_at: float = field(default_factory=time.time)
  steps: int = 0
  tool_calls: int = 0
  tokens_in: int = 0
  tokens_out: int = 0
  tool_usd: float = 0.0

  def elapsed_s(self) -> float:
      return time.time() - self.started_at


def estimate_model_usd(tokens_in: int, tokens_out: int) -> float:
  # Replace with your real pricing model(s). Approximate is fine for guards.
  return (tokens_in + tokens_out) * 0.000002


class BudgetExceeded(RuntimeError):
  def __init__(self, stop_reason: str, *, state: BudgetState):
      super().__init__(stop_reason)
      self.stop_reason = stop_reason
      self.state = state


class BudgetGuard:
  def __init__(self, policy: BudgetPolicy):
      self.policy = policy
      self.state = BudgetState()

  def total_usd(self) -> float:
      return estimate_model_usd(self.state.tokens_in, self.state.tokens_out) + self.state.tool_usd

  def check(self) -> None:
      if self.state.steps > self.policy.max_steps:
          raise BudgetExceeded("max_steps", state=self.state)
      if self.state.elapsed_s() > self.policy.max_seconds:
          raise BudgetExceeded("max_seconds", state=self.state)
      if self.state.tool_calls > self.policy.max_tool_calls:
          raise BudgetExceeded("max_tool_calls", state=self.state)
      if self.total_usd() > self.policy.max_usd:
          raise BudgetExceeded("max_usd", state=self.state)

  def on_step(self) -> None:
      self.state.steps += 1
      self.check()

  def on_model_call(self, *, tokens_in: int, tokens_out: int) -> None:
      self.state.tokens_in += tokens_in
      self.state.tokens_out += tokens_out
      self.check()

  def on_tool_call(self, *, tool: str) -> None:
      self.state.tool_calls += 1
      self.state.tool_usd += float(TOOL_USD.get(tool, 0.0))
      self.check()


def run_agent(task: str, *, policy: BudgetPolicy) -> dict[str, Any]:
  guard = BudgetGuard(policy)

  try:
      while True:
          guard.on_step()

          # model decides next action (pseudo)
          action, tokens_in, tokens_out = llm_decide(task)  # (pseudo)
          guard.on_model_call(tokens_in=tokens_in, tokens_out=tokens_out)

          if action.kind == "tool":
              guard.on_tool_call(tool=action.name)
              obs = call_tool(action.name, action.args)  # (pseudo)
              task = update_state(task, action, obs)  # (pseudo)
              continue

          return {"status": "ok", "answer": action.final_answer, "usage": guard.state.__dict__}

  except BudgetExceeded as e:
      return {
          "status": "stopped",
          "stop_reason": e.stop_reason,
          "usage": e.state.__dict__,
          "partial": "Stopped by budget. Return partial results + a reason users can understand.",
      }
JAVASCRIPT
const TOOL_USD = {
"search.read": 0.0,
"http.get": 0.0,
"browser.run": 0.2, // placeholder
};

export class BudgetExceeded extends Error {
constructor(stopReason, { state }) {
  super(stopReason);
  this.stopReason = stopReason;
  this.state = state;
}
}

export class BudgetGuard {
constructor(policy) {
  this.policy = policy;
  this.state = {
    startedAtMs: Date.now(),
    steps: 0,
    toolCalls: 0,
    tokensIn: 0,
    tokensOut: 0,
    toolUsd: 0,
  };
}

elapsedS() {
  return (Date.now() - this.state.startedAtMs) / 1000;
}

estimateModelUsd(tokensIn, tokensOut) {
  return (tokensIn + tokensOut) * 0.000002;
}

totalUsd() {
  return this.estimateModelUsd(this.state.tokensIn, this.state.tokensOut) + this.state.toolUsd;
}

check() {
  if (this.state.steps > this.policy.maxSteps) throw new BudgetExceeded("max_steps", { state: this.state });
  if (this.elapsedS() > this.policy.maxSeconds) throw new BudgetExceeded("max_seconds", { state: this.state });
  if (this.state.toolCalls > this.policy.maxToolCalls) throw new BudgetExceeded("max_tool_calls", { state: this.state });
  if (this.totalUsd() > this.policy.maxUsd) throw new BudgetExceeded("max_usd", { state: this.state });
}

onStep() {
  this.state.steps += 1;
  this.check();
}

onModelCall({ tokensIn, tokensOut }) {
  this.state.tokensIn += tokensIn;
  this.state.tokensOut += tokensOut;
  this.check();
}

onToolCall({ tool }) {
  this.state.toolCalls += 1;
  this.state.toolUsd += Number(TOOL_USD[tool] || 0);
  this.check();
}
}

Real failure case (incident-style, with numbers)

We shipped an internal “support helper” agent. It had a browser tool. No budgets. (Yes, really.)

Then the vendor search endpoint got flaky for ~90 minutes. The agent’s strategy became: “try again, slightly different query”.

Impact:

  • tool calls/run: 4 → 31
  • median latency: 6s → 58s
  • spend: +$1,120 in one afternoon (mostly browser runs)
  • on-call time: ~2.5 hours chasing “why is support slow?”

Fix:

  1. hard budgets per run (steps/time/tool calls/USD)
  2. explicit stop reasons returned to the UI
  3. alerting on tool_calls/run and stop_reason=max_usd
  4. a degrade mode: “no browser during vendor incidents”

Budgets didn’t make the agent smarter. They made it survivable.

Trade-offs

  • Tight budgets will stop some legitimate hard cases.
  • Cost estimation is approximate (but good enough to stop runaway runs).
  • You’ll need escalation paths (approve bigger budgets) for real “long” tasks.

When NOT to use

  • If the task is deterministic, don’t use an agent. Use a workflow with fixed costs.
  • If you can’t return partial output + stop reasons, budgets will look like random failures.
  • If you can’t measure anything, start with time/tool-call caps and add cost later.

Copy-paste checklist

  • [ ] Enforce budgets in one choke point (loop + tool gateway)
  • [ ] Cap: steps, seconds, tool calls, USD
  • [ ] Track tokens + tool calls + rough spend
  • [ ] Return stop reasons (not silent timeouts)
  • [ ] Add budget tiers (default vs approved)
  • [ ] Alert on spend spikes + stop_reason distribution changes
  • [ ] Define degrade mode behavior during incidents

Safe default config snippet (JSON/YAML)

YAML
budgets:
  default:
    max_steps: 25
    max_seconds: 60
    max_tool_calls: 12
    max_usd: 1.0
  approved:
    max_steps: 80
    max_seconds: 240
    max_tool_calls: 40
    max_usd: 8.0
stop_reasons:
  return_to_user: true
  log: true
  alert_on: ["max_usd", "max_seconds", "max_tool_calls"]

FAQ (3–5)

What budget should we start with?
Start with time + tool-call caps. Then add USD once you can estimate model/tool costs. The first goal is stopping runaway loops.
Should budgets be hard-fail or degrade?
Prefer degrade with a clear stop reason and partial output. Hard-failing trains users to spam retries.
How do we handle tasks that need more budget?
Escalate: require approval, run async, or move to a higher budget tier with stricter logging.
Do budgets replace rate limits?
No. Rate limits protect dependencies. Budgets protect you from your own loop.

Q: What budget should we start with?
A: Start with time + tool-call caps. Then add USD once you can estimate model/tool costs. The first goal is stopping runaway loops.

Q: Should budgets be hard-fail or degrade?
A: Prefer degrade with a clear stop reason and partial output. Hard-failing trains users to spam retries.

Q: How do we handle tasks that need more budget?
A: Escalate: require approval, run async, or move to a higher budget tier with stricter logging.

Q: Do budgets replace rate limits?
A: No. Rate limits protect dependencies. Budgets protect you from your own loop.

Not sure this is your use case?

Design your agent ->
⏱ 8 min read ‱ Updated Mar, 2026Difficulty: ★★★
Implement in OnceOnly
Budgets + permissions you can enforce at the boundary.
Use in OnceOnly
# onceonly guardrails (concept)
version: 1
budgets:
  max_steps: 25
  max_tool_calls: 12
  max_seconds: 60
  max_usd: 1.00
policy:
  tool_allowlist:
    - search.read
    - http.get
writes:
  require_approval: true
  idempotency: true
controls:
  kill_switch: { enabled: true }
Integrated: production controlOnceOnly
Add guardrails to tool-calling agents
Ship this pattern with governance:
  • Budgets (steps / spend caps)
  • Tool permissions (allowlist / blocklist)
  • Kill switch & incident stop
  • Idempotency & dedupe
  • Audit logs & traceability
Integrated mention: OnceOnly is a control layer for production agent systems.
Author

This documentation is curated and maintained by engineers who ship AI agents in production.

The content is AI-assisted, with human editorial responsibility for accuracy, clarity, and production relevance.

Patterns and recommendations are grounded in post-mortems, failure modes, and operational incidents in deployed systems, including during the development and operation of governance infrastructure for agents at OnceOnly.