Cost Limits for Agents (Token + Tool Spend) + Code

Token budgets don’t stop tool spend. Cost limits track model + tools together, gate expensive actions, and force explicit approval before the agent burns real money.
On this page
  1. Problem-first intro
  2. Why this fails in production
  3. 1) Cost is multi-dimensional
  4. 2) You don’t know cost unless you meter it
  5. 3) Expensive tools need gating, not “be careful”
  6. Implementation example (real code)
  7. Real failure case (incident-style, with numbers)
  8. Trade-offs
  9. When NOT to use
  10. Copy-paste checklist
  11. Safe default config snippet (JSON/YAML)
  12. FAQ (3–5)
  13. Related pages (3–6 links)
Interactive flow
Scenario:
Step 1/3: Execution

Action is proposed as structured data (tool + args).

Problem-first intro

You set a “token limit”.

The agent still costs $12.

Because the expensive part wasn’t tokens. It was tools:

  • browser automation
  • vendor APIs
  • OCR
  • third-party search

Cost limits are governance because they force a hard question: “Is this run worth another $X?”

If the answer is “maybe”, you need an approval gate, not a longer prompt. If you only cap tokens, you’re basically putting a speed limit on one wheel.

Why this fails in production

1) Cost is multi-dimensional

You pay for:

  • model tokens (input + output)
  • tool calls (per call, per minute, per document)
  • retries (multipliers)
  • latency (compute + queue time)

If you only cap one axis, the agent will “escape” via the others.

2) You don’t know cost unless you meter it

Teams discover spend in invoices because they didn’t log:

  • tokens/run
  • tool_calls/run
  • tool_usd/run
  • stop_reason

3) Expensive tools need gating, not “be careful”

If browser.run costs $0.20 and the agent can call it 40 times, you built a slot machine.

Put a gate in code:

  • tiered budgets
  • human approval for expensive actions
  • safe defaults: no browser unless needed

Implementation example (real code)

This pattern:

  • tracks running spend (model + tools)
  • stops when max_usd is hit
  • requires approval above a threshold (before calling expensive tools)
PYTHON
from dataclasses import dataclass
from typing import Any


TOOL_USD = {"browser.run": 0.20, "ocr.run": 0.10}


@dataclass(frozen=True)
class CostPolicy:
  max_usd: float = 2.00
  approval_threshold_usd: float = 0.75


class ApprovalRequired(RuntimeError):
  pass


class CostLimitExceeded(RuntimeError):
  pass


class CostMeter:
  def __init__(self, policy: CostPolicy):
      self.policy = policy
      self.usd = 0.0

  def add_model(self, *, tokens_in: int, tokens_out: int) -> None:
      self.usd += (tokens_in + tokens_out) * 0.000002  # placeholder
      self._check()

  def add_tool(self, *, tool: str) -> None:
      self.usd += float(TOOL_USD.get(tool, 0.0))
      self._check()

  def gate_tool(self, *, tool: str) -> None:
      projected = self.usd + float(TOOL_USD.get(tool, 0.0))
      if projected >= self.policy.approval_threshold_usd:
          raise ApprovalRequired(f"approval required before calling {tool} (projected_usd={projected:.2f})")

  def _check(self) -> None:
      if self.usd >= self.policy.max_usd:
          raise CostLimitExceeded(f"max_usd exceeded ({self.usd:.2f})")


def run(task: str, *, policy: CostPolicy) -> dict[str, Any]:
  meter = CostMeter(policy)
  while True:
      action, tokens_in, tokens_out = llm_decide(task)  # (pseudo)
      meter.add_model(tokens_in=tokens_in, tokens_out=tokens_out)

      if action.kind != "tool":
          return {"status": "ok", "answer": action.final_answer, "usd": meter.usd}

      meter.gate_tool(tool=action.name)
      obs = call_tool(action.name, action.args)  # (pseudo)
      meter.add_tool(tool=action.name)
      task = update(task, action, obs)  # (pseudo)
JAVASCRIPT
const TOOL_USD = { "browser.run": 0.2, "ocr.run": 0.1 };

export class ApprovalRequired extends Error {}
export class CostLimitExceeded extends Error {}

export class CostMeter {
constructor({ maxUsd = 2.0, approvalThresholdUsd = 0.75 } = {}) {
  this.maxUsd = maxUsd;
  this.approvalThresholdUsd = approvalThresholdUsd;
  this.usd = 0;
}

addModel({ tokensIn, tokensOut }) {
  this.usd += (tokensIn + tokensOut) * 0.000002; // placeholder
  this.check();
}

addTool({ tool }) {
  this.usd += Number(TOOL_USD[tool] || 0);
  this.check();
}

gateTool({ tool }) {
  const projected = this.usd + Number(TOOL_USD[tool] || 0);
  if (projected >= this.approvalThresholdUsd) {
    throw new ApprovalRequired("approval required before calling " + tool + " (projected_usd=" + projected.toFixed(2) + ")");
  }
}

check() {
  if (this.usd >= this.maxUsd) throw new CostLimitExceeded("max_usd exceeded (" + this.usd.toFixed(2) + ")");
}
}

Real failure case (incident-style, with numbers)

We had an agent that “verified information” by browsing. It was correct. It was also expensive.

Someone changed the prompt to “double-check sources”. That turned into more browser calls.

Impact over 3 days:

  • browser calls/run: 1.4 → 6.8
  • spend: +$980 vs baseline
  • nobody noticed until finance asked

Fix:

  1. cost meter that combines model + tools
  2. approval gate for browser.run above $0.75 projected spend
  3. a cheap path first: use kb.read / cached sources before browsing
  4. alerting on tool_usd/run

“It’s correct” isn’t a budget policy.

Trade-offs

  • Approval gates reduce automation (that’s the point).
  • Projected cost is imperfect (still better than unlimited).
  • Some tasks need higher budgets; create explicit tiers with better logging.

When NOT to use

  • If the agent never calls paid tools, a simple token budget may be enough (still track tokens).
  • If your cost model is unknown, start with tool-call caps per tool and add USD later.
  • If you can’t build approvals, don’t expose expensive tools to unattended loops.

Copy-paste checklist

  • [ ] Track spend per run (model + tools)
  • [ ] Cap max USD per run
  • [ ] Gate expensive tools behind approvals
  • [ ] Prefer cheap tools first (kb/cache) before paid tools
  • [ ] Alert on tool_usd/run spikes and drift
  • [ ] Return stop reasons users can act on

Safe default config snippet (JSON/YAML)

YAML
cost:
  max_usd_per_run: 2.0
  approval_threshold_usd: 0.75
tools:
  priced:
    browser.run: 0.20
    ocr.run: 0.10
approvals:
  required_when_projected_over_threshold: true

FAQ (3–5)

Do I need exact pricing to enforce cost limits?
No. Approximate is fine to stop runaway behavior. Tighten it later with real tool billing data.
Should approvals be per-tool or per-run?
Start per-tool for expensive actions. Later add per-run tiers (default vs approved) for long investigations.
What if users always approve?
Then at least the spend is intentional and auditable. ‘Accidental spend’ is what hurts.
Isn’t this just budgets again?
Yes, but cost limits force you to count tools. Token budgets don’t.

Q: Do I need exact pricing to enforce cost limits?
A: No. Approximate is fine to stop runaway behavior. Tighten it later with real tool billing data.

Q: Should approvals be per-tool or per-run?
A: Start per-tool for expensive actions. Later add per-run tiers (default vs approved) for long investigations.

Q: What if users always approve?
A: Then at least the spend is intentional and auditable. “Accidental spend” is what hurts.

Q: Isn’t this just budgets again?
A: Yes, but cost limits force you to count tools. Token budgets don’t.

Not sure this is your use case?

Design your agent ->
⏱️ 6 min readUpdated Mar, 2026Difficulty: ★★★
Implement in OnceOnly
Budgets + permissions you can enforce at the boundary.
Use in OnceOnly
# onceonly guardrails (concept)
version: 1
budgets:
  max_steps: 25
  max_tool_calls: 12
  max_seconds: 60
  max_usd: 1.00
policy:
  tool_allowlist:
    - search.read
    - http.get
writes:
  require_approval: true
  idempotency: true
controls:
  kill_switch: { enabled: true }
Integrated: production controlOnceOnly
Add guardrails to tool-calling agents
Ship this pattern with governance:
  • Budgets (steps / spend caps)
  • Tool permissions (allowlist / blocklist)
  • Kill switch & incident stop
  • Idempotency & dedupe
  • Audit logs & traceability
Integrated mention: OnceOnly is a control layer for production agent systems.
Author

This documentation is curated and maintained by engineers who ship AI agents in production.

The content is AI-assisted, with human editorial responsibility for accuracy, clarity, and production relevance.

Patterns and recommendations are grounded in post-mortems, failure modes, and operational incidents in deployed systems, including during the development and operation of governance infrastructure for agents at OnceOnly.