Cost Limits for Agents (Token + Tool Spend) + Code

Interactive flow

Scenario:

Step 1/3: Execution

Action is proposed as structured data (tool + args).

Problem-first intro

You set a “token limit”.

The agent still costs $12.

Because the expensive part wasn’t tokens. It was tools:

browser automation
vendor APIs
OCR
third-party search

Cost limits are governance because they force a hard question: “Is this run worth another $X?”

If the answer is “maybe”, you need an approval gate, not a longer prompt. If you only cap tokens, you’re basically putting a speed limit on one wheel.

Why this fails in production

1) Cost is multi-dimensional

You pay for:

model tokens (input + output)
tool calls (per call, per minute, per document)
retries (multipliers)
latency (compute + queue time)

If you only cap one axis, the agent will “escape” via the others.

2) You don’t know cost unless you meter it

Teams discover spend in invoices because they didn’t log:

tokens/run
tool_calls/run
tool_usd/run
stop_reason

3) Expensive tools need gating, not “be careful”

If browser.run costs $0.20 and the agent can call it 40 times, you built a slot machine.

Put a gate in code:

tiered budgets
human approval for expensive actions
safe defaults: no browser unless needed

Implementation example (real code)

This pattern:

tracks running spend (model + tools)
stops when max_usd is hit
requires approval above a threshold (before calling expensive tools)

PythonJS

PYTHON

from dataclasses import dataclass
from typing import Any


TOOL_USD = {"browser.run": 0.20, "ocr.run": 0.10}


@dataclass(frozen=True)
class CostPolicy:
  max_usd: float = 2.00
  approval_threshold_usd: float = 0.75


class ApprovalRequired(RuntimeError):
  pass


class CostLimitExceeded(RuntimeError):
  pass


class CostMeter:
  def __init__(self, policy: CostPolicy):
      self.policy = policy
      self.usd = 0.0

  def add_model(self, *, tokens_in: int, tokens_out: int) -> None:
      self.usd += (tokens_in + tokens_out) * 0.000002  # placeholder
      self._check()

  def add_tool(self, *, tool: str) -> None:
      self.usd += float(TOOL_USD.get(tool, 0.0))
      self._check()

  def gate_tool(self, *, tool: str) -> None:
      projected = self.usd + float(TOOL_USD.get(tool, 0.0))
      if projected >= self.policy.approval_threshold_usd:
          raise ApprovalRequired(f"approval required before calling {tool} (projected_usd={projected:.2f})")

  def _check(self) -> None:
      if self.usd >= self.policy.max_usd:
          raise CostLimitExceeded(f"max_usd exceeded ({self.usd:.2f})")


def run(task: str, *, policy: CostPolicy) -> dict[str, Any]:
  meter = CostMeter(policy)
  while True:
      action, tokens_in, tokens_out = llm_decide(task)  # (pseudo)
      meter.add_model(tokens_in=tokens_in, tokens_out=tokens_out)

      if action.kind != "tool":
          return {"status": "ok", "answer": action.final_answer, "usd": meter.usd}

      meter.gate_tool(tool=action.name)
      obs = call_tool(action.name, action.args)  # (pseudo)
      meter.add_tool(tool=action.name)
      task = update(task, action, obs)  # (pseudo)

JAVASCRIPT

const TOOL_USD = { "browser.run": 0.2, "ocr.run": 0.1 };

export class ApprovalRequired extends Error {}
export class CostLimitExceeded extends Error {}

export class CostMeter {
constructor({ maxUsd = 2.0, approvalThresholdUsd = 0.75 } = {}) {
  this.maxUsd = maxUsd;
  this.approvalThresholdUsd = approvalThresholdUsd;
  this.usd = 0;
}

addModel({ tokensIn, tokensOut }) {
  this.usd += (tokensIn + tokensOut) * 0.000002; // placeholder
  this.check();
}

addTool({ tool }) {
  this.usd += Number(TOOL_USD[tool] || 0);
  this.check();
}

gateTool({ tool }) {
  const projected = this.usd + Number(TOOL_USD[tool] || 0);
  if (projected >= this.approvalThresholdUsd) {
    throw new ApprovalRequired("approval required before calling " + tool + " (projected_usd=" + projected.toFixed(2) + ")");
  }
}

check() {
  if (this.usd >= this.maxUsd) throw new CostLimitExceeded("max_usd exceeded (" + this.usd.toFixed(2) + ")");
}
}

Real failure case (incident-style, with numbers)

We had an agent that “verified information” by browsing. It was correct. It was also expensive.

Someone changed the prompt to “double-check sources”. That turned into more browser calls.

Impact over 3 days:

browser calls/run: 1.4 → 6.8
spend: +$980 vs baseline
nobody noticed until finance asked

Fix:

cost meter that combines model + tools
approval gate for browser.run above $0.75 projected spend
a cheap path first: use kb.read / cached sources before browsing
alerting on tool_usd/run

“It’s correct” isn’t a budget policy.

Trade-offs

Approval gates reduce automation (that’s the point).
Projected cost is imperfect (still better than unlimited).
Some tasks need higher budgets; create explicit tiers with better logging.

When NOT to use

If the agent never calls paid tools, a simple token budget may be enough (still track tokens).
If your cost model is unknown, start with tool-call caps per tool and add USD later.
If you can’t build approvals, don’t expose expensive tools to unattended loops.

Copy-paste checklist

[ ] Track spend per run (model + tools)
[ ] Cap max USD per run
[ ] Gate expensive tools behind approvals
[ ] Prefer cheap tools first (kb/cache) before paid tools
[ ] Alert on tool_usd/run spikes and drift
[ ] Return stop reasons users can act on

Safe default config snippet (JSON/YAML)

YAML

cost:
  max_usd_per_run: 2.0
  approval_threshold_usd: 0.75
tools:
  priced:
    browser.run: 0.20
    ocr.run: 0.10
approvals:
  required_when_projected_over_threshold: true

FAQ (3–5)

Used by patterns

Related failures

Governance required

Do I need exact pricing to enforce cost limits?

No. Approximate is fine to stop runaway behavior. Tighten it later with real tool billing data.

Should approvals be per-tool or per-run?

Start per-tool for expensive actions. Later add per-run tiers (default vs approved) for long investigations.

What if users always approve?

Then at least the spend is intentional and auditable. ‘Accidental spend’ is what hurts.

Isn’t this just budgets again?

Yes, but cost limits force you to count tools. Token budgets don’t.

Q: Do I need exact pricing to enforce cost limits?
A: No. Approximate is fine to stop runaway behavior. Tighten it later with real tool billing data.

Q: Should approvals be per-tool or per-run?
A: Start per-tool for expensive actions. Later add per-run tiers (default vs approved) for long investigations.

Q: What if users always approve?
A: Then at least the spend is intentional and auditable. “Accidental spend” is what hurts.

Q: Isn’t this just budgets again?
A: Yes, but cost limits force you to count tools. Token budgets don’t.

Foundations: How agents use tools · How LLM limits affect agents
Failure: Budget explosion · Token overuse incidents
Governance: Budget controls · Human approval gates
Production stack: Production agent stack