El problema (en producción)
You have a task: “handle support tickets”, “triage alerts”, “enrich leads”, “review code”.
Someone suggests an agent. Someone else suggests a workflow.
In a demo, the agent wins. In production, the winner is usually: the thing you can operate.
The most expensive mistake we see is choosing an agent when you needed a workflow, and then adding governance until it’s basically a workflow anyway — except now it’s nondeterministic.
Decisión rápida (quién debería elegir qué)
- Pick a workflow when you can define steps, inputs, and success conditions. You’ll ship faster and sleep better.
- Pick an agent when the environment is messy (unknown docs, noisy tools) and you can’t enumerate all paths — but only if you’re willing to add budgets, permissions, and monitoring.
- If you’re not ready to build a control layer, don’t pick an agent. Pick a workflow.
Por qué se elige mal en producción
1) They confuse “flexible” with “reliable”
Agents are flexible. Reliability comes from:
- budgets
- validations
- idempotency
- approvals
- monitoring
Without those, agents are flexible at creating incidents.
2) They underestimate governance cost
The first time an agent loops, you add step limits. The first time it spams a tool, you add tool budgets. The first time it writes incorrectly, you add approvals.
At that point, you’ve built a workflow… but with extra variance.
3) They start with writes
Agents with write tools in week one are a predictable failure. Start read-only.
4) Workflows fail loudly, agents fail quietly
Workflow failure: a step errors. Agent failure: it “kind of works” but gets slower, costlier, and weirder.
That’s drift. Drift is a production problem.
Tabla comparativa
| Criteria | Workflow | LLM Agent | What matters in prod | |---|---|---|---| | Determinism | High | Low/medium | Debuggability, replay | | Failure handling | Explicit | Emergent unless designed | Prevent thrash, stop reasons | | Observability | Straightforward | Requires intentional tracing | “What did it do?” | | Cost control | Predictable | Needs budgets + gating | No finance surprises | | Change safety | Standard deploy | Drift-prone | Canary, golden tasks | | Best for | Known paths | Unknown paths | Match system to reality |
Dónde se rompe en producción
The failure modes differ:
Workflow breaks
- a step fails (timeout, 500)
- a queue backs up
- a schema changes
Fixes are mostly deterministic: retry policy, backoff, idempotency, rollbacks.
Agent breaks
- tool spam loops (search thrash)
- partial outages amplify (retries in loops)
- prompt injection steers tool calls
- token overuse truncates policy
- silent drift changes behavior
Agents break like control systems, because they are control systems.
Ejemplo de implementación (código real)
The “agent vs workflow” decision isn’t about libraries. It’s about boundaries.
Here’s a minimal boundary you can use for either:
- tool gateway with allowlist
- budgets (steps/tool calls/time)
- stop reasons
from dataclasses import dataclass
from typing import Any
@dataclass(frozen=True)
class Budgets:
max_steps: int = 25
max_tool_calls: int = 12
class Stop(RuntimeError):
def __init__(self, reason: str):
super().__init__(reason)
self.reason = reason
class ToolGateway:
def __init__(self, *, allow: set[str]):
self.allow = allow
self.calls = 0
def call(self, tool: str, args: dict[str, Any], *, budgets: Budgets) -> Any:
self.calls += 1
if self.calls > budgets.max_tool_calls:
raise Stop("max_tool_calls")
if tool not in self.allow:
raise Stop(f"tool_denied:{tool}")
return tool_impl(tool, args=args) # (pseudo)
def workflow(task: str, *, budgets: Budgets) -> dict[str, Any]:
tools = ToolGateway(allow={"kb.read"})
try:
doc = tools.call("kb.read", {"q": task}, budgets=budgets)
return {"status": "ok", "answer": summarize(doc)} # (pseudo)
except Stop as e:
return {"status": "stopped", "stop_reason": e.reason}
def agent(task: str, *, budgets: Budgets) -> dict[str, Any]:
tools = ToolGateway(allow={"search.read", "kb.read", "http.get"})
try:
for _ in range(budgets.max_steps):
action = llm_decide(task) # (pseudo)
if action.kind == "final":
return {"status": "ok", "answer": action.final_answer}
obs = tools.call(action.name, action.args, budgets=budgets)
task = update(task, action, obs) # (pseudo)
return {"status": "stopped", "stop_reason": "max_steps"}
except Stop as e:
return {"status": "stopped", "stop_reason": e.reason}export class Stop extends Error {
constructor(reason) {
super(reason);
this.reason = reason;
}
}
export class ToolGateway {
constructor({ allow = [] } = {}) {
this.allow = new Set(allow);
this.calls = 0;
}
call(tool, args, { budgets }) {
this.calls += 1;
if (this.calls > budgets.maxToolCalls) throw new Stop("max_tool_calls");
if (!this.allow.has(tool)) throw new Stop("tool_denied:" + tool);
return toolImpl(tool, { args }); // (pseudo)
}
}Incidente real (con números)
We saw a team replace a simple workflow with an agent “for flexibility”.
The workflow had fixed steps and predictable costs. The agent started calling search + browser tools because “maybe it helps”.
Impact in the first week:
- p95 latency: 1.9s → 9.7s
- spend: +$640 vs baseline
- and the worst part: incidents were harder to debug because behavior wasn’t deterministic
Fix:
- they moved 80% of the task back into a workflow
- the agent became a bounded “investigation step” behind strict budgets
- writes required approval
In production, hybrid usually wins: workflow for the known path, agent for the messy corner.
Ruta de migración (A → B)
Workflow → Agent (safe-ish)
- keep the workflow as the default path
- add an agent only for ambiguous sub-tasks (bounded)
- enforce budgets + permissions + monitoring first
- canary rollout + golden tasks to catch drift
Agent → Workflow (when you regret it)
- log traces and identify the common path
- codify common path as deterministic steps
- keep the agent only for exceptions
- delete “agent as default” once confidence is high
Guía de decisión
- If you can write a state machine for it → pick a workflow.
- If you can’t, but the cost of being wrong is low → bounded agent might work.
- If the cost of being wrong is high → workflow + approvals, or don’t automate.
- If you can’t afford monitoring and governance → don’t ship an agent.
Trade-offs
- Workflows are less flexible.
- Agents require governance to be safe.
- Hybrid systems add complexity, but often reduce incident rate.
Cuándo NO usarlo
- Don’t use agents for irreversible writes without approvals.
- Don’t use agents when success conditions are crisp and steps are known.
- Don’t use workflows when the input space is too open-ended (you’ll just rebuild an agent poorly).
Checklist (copiar/pegar)
- [ ] Can you enumerate steps? If yes, start with a workflow.
- [ ] If you use an agent, add budgets + tool gateway first.
- [ ] Start read-only; gate writes behind approvals.
- [ ] Return stop reasons; don’t timeout silently.
- [ ] Monitor tokens, tool calls, latency, stop reasons.
- [ ] Canary changes to models/prompts/tools; expect drift.
Config segura por defecto (JSON/YAML)
mode:
default: "workflow"
agent_for_exceptions: true
budgets:
max_steps: 25
max_tool_calls: 12
max_seconds: 60
tools:
allow: ["kb.read", "search.read", "http.get"]
writes:
require_approval: true
monitoring:
track: ["tool_calls_per_run", "tokens_per_request", "latency_p95", "stop_reason"]
FAQ (3–5)
Usado por patrones
Fallos relacionados
- AI Agent Infinite Loop (Detectar + arreglar, con código)
- Explosión de presupuesto (cuando un agente quema dinero) + fixes + código
- Tool Spam Loops (fallo del agente + fixes + código)
- Incidentes de exceso de tokens (prompt bloat) + fixes + código
- Corrupción de respuestas de tools (schema drift + truncation) + código
Gobernanza requerida
Q: Can we use an agent without a tool gateway?
A: If there are no tools and no side effects, maybe. The moment tools exist, you need a gateway for policy and budgets.
Q: What’s the safest hybrid?
A: Workflow for the common path, bounded agent for investigations, approvals for writes.
Q: Why do agents drift more?
A: Model/prompt/tool changes shift decisions. Without golden tasks and canaries, regressions ship quietly.
Q: What’s the first metric to watch?
A: Tool calls/run. It moves before correctness complaints and before invoices.
Páginas relacionadas (3–6 links)
- Foundations: Workflow vs agent (start here) · Planning vs reactive agents
- Failure: Tool spam loops · Silent agent drift
- Governance: Budget controls · Tool permissions
- Production stack: Production agent stack