Action is proposed as structured data (tool + args).
Problem-first intro
Your agent is doing the wrong thing.
Not “the answer is a bit off”. Wrong thing as in:
- sending duplicate emails
- creating tickets in bulk
- hammering an API until you get rate-limited
And now the important part: you don’t have time to “fix the prompt and redeploy”.
You need a kill switch that:
- works right now
- is auditable (who flipped it, when, why)
- stops the side effects, not just the UI
If your kill switch only lives in the frontend, it’s not a kill switch. It’s a placebo. If your kill switch is an env var, it’s a deploy. Incidents don’t wait for deploys.
Why this fails in production
1) Teams build “pause” buttons that don’t pause anything
Common anti-design:
- UI hides the button
- API still runs the agent loop
- tool gateway still executes writes
If tool calls still go through, you didn’t stop the incident. You renamed it.
2) Kill switches that aren’t checked in the tool gateway leak
If you check the kill switch:
- in one route
- but not in background jobs
- and not in the tool gateway
…you will miss a path.
3) “Stop the run” is not enough
In-flight tool calls exist:
- long HTTP calls
- browser sessions
- queue workers already executing
Your kill switch needs semantics:
- stop new runs
- stop new tool calls
- optionally force-cancel in-flight work (best-effort)
4) Scope matters: global vs per-tenant
You don’t want to stop the whole product because one tenant is triggering a loop. You want:
- global switch (nuclear)
- per-tenant switch (surgical)
- per-tool disable list (e.g., “no browser today”)
Implementation example (real code)
This pattern:
- reads kill switch state from a shared store (pseudo)
- checks it in two places: loop + tool gateway
- distinguishes “stop all” vs “disable writes”
from dataclasses import dataclass
from typing import Any
@dataclass(frozen=True)
class KillState:
stop_all: bool = False
disable_writes: bool = True
disabled_tools: set[str] = None
class Killed(RuntimeError):
pass
def load_kill_state(*, tenant_id: str) -> KillState:
# Pseudo: Redis/DB/feature-flag service. Must be fast + reliable.
# Split global + per-tenant state.
global_state = read_flag("agent_kill_global") # (pseudo)
tenant_state = read_flag(f"agent_kill_tenant:{tenant_id}") # (pseudo)
disabled_tools = set(read_list("agent_disabled_tools")) # (pseudo)
return KillState(
stop_all=bool(global_state or tenant_state),
disable_writes=True,
disabled_tools=disabled_tools,
)
WRITE_TOOLS = {"email.send", "db.write", "ticket.create", "ticket.close"}
def guard_tool_call(*, kill: KillState, tool: str) -> None:
if kill.stop_all:
raise Killed("killed: stop_all")
if tool in (kill.disabled_tools or set()):
raise Killed(f"killed: tool_disabled:{tool}")
if kill.disable_writes and tool in WRITE_TOOLS:
raise Killed(f"killed: writes_disabled:{tool}")
def run(task: str, *, tenant_id: str, tools) -> dict[str, Any]:
kill = load_kill_state(tenant_id=tenant_id)
for _ in range(1000):
if kill.stop_all:
return {"status": "stopped", "stop_reason": "killed"}
action = llm_decide(task) # (pseudo)
if action.kind != "tool":
return {"status": "ok", "answer": action.final_answer}
guard_tool_call(kill=kill, tool=action.name)
obs = tools.call(action.name, action.args) # (pseudo)
task = update(task, action, obs) # (pseudo)
return {"status": "stopped", "stop_reason": "max_steps"}const WRITE_TOOLS = new Set(["email.send", "db.write", "ticket.create", "ticket.close"]);
export class Killed extends Error {}
export function loadKillState({ tenantId }) {
// Pseudo: feature-flag store. Must be fast + reliable.
const globalStop = readFlag("agent_kill_global"); // (pseudo)
const tenantStop = readFlag("agent_kill_tenant:" + tenantId); // (pseudo)
const disabledTools = new Set(readList("agent_disabled_tools")); // (pseudo)
return { stopAll: Boolean(globalStop || tenantStop), disableWrites: true, disabledTools };
}
export function guardToolCall({ kill, tool }) {
if (kill.stopAll) throw new Killed("killed: stop_all");
if (kill.disabledTools && kill.disabledTools.has(tool)) throw new Killed("killed: tool_disabled:" + tool);
if (kill.disableWrites && WRITE_TOOLS.has(tool)) throw new Killed("killed: writes_disabled:" + tool);
}Real failure case (incident-style, with numbers)
We had an agent that drafted and sent follow-up emails. It was behind a “send_email” tool and (oops) no approval gate yet.
A prompt change caused it to interpret “follow up” as “send now”.
Impact in 22 minutes:
- 117 emails sent (some duplicates)
- we spent ~4 hours doing customer damage control
- the model wasn’t “hacked” — it was just wrong, loudly
The kill switch we thought we had was a UI toggle. Background workers ignored it.
Fix:
- kill switch enforced in tool gateway (writes disabled)
- per-tenant stop (so one tenant doesn’t nuke everyone)
- audit log entries when kill state blocks a tool call
- incident runbook: flip kill switch first, ask questions second
Trade-offs
- Kill switches reduce availability during incidents. That’s better than irreversible writes.
- You have to test the kill path. Untested kill switches fail at the worst time.
- Shared-state reads add latency; keep it fast and cache briefly (seconds, not minutes).
When NOT to use
- Don’t use “kill switch” as a substitute for real governance (permissions, approvals, budgets).
- Don’t build a kill switch that’s only client-side. It will lie to you.
- Don’t rely on kill switches for “normal” flow. They’re for stopping the bleeding.
Copy-paste checklist
- [ ] Global kill switch (stop new runs)
- [ ] Per-tenant kill switch (surgical stop)
- [ ] Enforced in tool gateway (stops side effects)
- [ ] Disable writes mode (read-only degrade)
- [ ] Tool disable list (e.g., “no browser”)
- [ ] Audit logs for kill blocks + operator actions
- [ ] Tested runbook: flip, verify, drain, recover
Safe default config snippet (JSON/YAML)
kill_switch:
global_flag: "agent_kill_global"
per_tenant_flag_prefix: "agent_kill_tenant:"
mode_when_enabled: "disable_writes"
disabled_tools_key: "agent_disabled_tools"
cache_ttl_s: 2
FAQ (3–5)
Used by patterns
Related failures
Q: Should the kill switch stop everything or only writes?
A: Default to disabling writes first. Stopping everything is the nuclear option when you can’t trust the loop at all.
Q: Where do we enforce the kill switch?
A: In the tool gateway and in the run loop. If it’s not enforced on tool calls, it’s not real.
Q: Can we cache kill state?
A: Yes, but keep TTL in seconds. Incidents are measured in seconds, not minutes.
Q: Do we need per-tenant kill switches?
A: If you’re multi-tenant: absolutely. Otherwise one customer’s incident becomes everyone’s outage.
Related pages (3–6 links)
- Foundations: What makes an agent production-ready · Why agents fail in production
- Failure: Cascading tool failures · Tool spam loops
- Governance: Budget controls · Tool permissions
- Production stack: Production agent stack