El problema (en producción)
You want to ship an agent system that does real work, not a weekend demo.
Someone on the team says: “Let’s do multi-agent with CrewAI.” Someone else says: “We should use LangGraph; graphs are easier to reason about.”
Both can work. Both can also produce the same outcome in production: a slow, expensive, hard-to-debug system if you don’t build a control layer.
The question isn’t “which is cooler”. The question is: which one makes failure modes obvious and governable.
Decisión rápida (quién debería elegir qué)
- Pick CrewAI if you explicitly want role-based multi-agent collaboration and you can invest in orchestration + monitoring to prevent deadlocks/thrash.
- Pick LangGraph if you want explicit state + deterministic-ish transitions you can test, replay, and roll back without guessing what the model “meant”.
- If you don’t have strong budgets/permissions/monitoring yet, LangGraph-style explicit flow usually hurts less.
Por qué se elige mal en producción
1) They pick based on “demo vibes”
Multi-agent role play looks impressive. It also adds:
- coordination overhead
- waiting states
- circular dependencies
- more tool calls
If you’re not ready to instrument it, it’ll fail quietly.
2) They confuse “graph” with “safe”
A graph is not governance. It’s a place to put governance.
You still need:
- budgets
- permissions
- validation
- approvals for writes
- stop reasons
3) They don’t define state
If you can’t write down:
- current state
- allowed transitions
- stop conditions
…your system will drift into “agent chooses everything”, which is just a fancy way to say “debugging is vibes”.
Tabla comparativa
| Criterion | CrewAI | LangGraph | What matters in prod | |---|---|---| | Primary abstraction | Roles + collaboration | State + transitions | Debuggability | | Determinism | Lower | Higher | Replay + tests | | Failure handling | Emergent unless designed | Easier to encode | Stop reasons | | Observability | You must add it | You must add it | “What did it do?” | | Loop/Deadlock risk | Higher | Medium | On-call load | | Migration friendliness | Medium | High | Canaries/rollback |
Dónde se rompe en producción
CrewAI-style multi-agent breaks
- agents wait on each other (deadlocks)
- roles “disagree” and loop
- more context passed around → token overuse
- tool spam (agents “helpfully” re-search)
LangGraph-style flow breaks
- state machine grows complex
- devs cram “just let the model decide” nodes everywhere
- missing validation on edges turns graphs into “unsafe pipes”
The common failure is the same: missing governance.
Ejemplo de implementación (código real)
The production trick is to separate:
- your orchestration framework
- your control layer (which should survive framework changes)
This is a framework-agnostic tool gateway + budget guard you can wrap around either approach.
from dataclasses import dataclass
from typing import Any, Callable
import time
@dataclass(frozen=True)
class Budgets:
max_steps: int = 40
max_tool_calls: int = 20
max_seconds: int = 120
class Stop(RuntimeError):
def __init__(self, reason: str):
super().__init__(reason)
self.reason = reason
class ToolGateway:
def __init__(self, *, allow: set[str], impls: dict[str, Callable[..., Any]]):
self.allow = allow
self.impls = impls
self.calls = 0
def call(self, tool: str, args: dict[str, Any], *, budgets: Budgets) -> Any:
self.calls += 1
if self.calls > budgets.max_tool_calls:
raise Stop("max_tool_calls")
if tool not in self.allow:
raise Stop(f"tool_denied:{tool}")
fn = self.impls.get(tool)
if not fn:
raise Stop(f"tool_missing:{tool}")
return fn(**args)
def run_framework(orchestration_fn, *, budgets: Budgets, tools: ToolGateway) -> dict[str, Any]:
started = time.time()
for step in range(budgets.max_steps):
if time.time() - started > budgets.max_seconds:
return {"status": "stopped", "stop_reason": "max_seconds"}
try:
# orchestration_fn must call tools via ToolGateway only.
out = orchestration_fn(step=step, tools=tools) # (pseudo)
if out.get("done"):
return {"status": "ok", "result": out.get("result")}
except Stop as e:
return {"status": "stopped", "stop_reason": e.reason}
return {"status": "stopped", "stop_reason": "max_steps"}export class Stop extends Error {
constructor(reason) {
super(reason);
this.reason = reason;
}
}
export class ToolGateway {
constructor({ allow = [], impls = {} } = {}) {
this.allow = new Set(allow);
this.impls = impls;
this.calls = 0;
}
call(tool, args, { budgets }) {
this.calls += 1;
if (this.calls > budgets.maxToolCalls) throw new Stop("max_tool_calls");
if (!this.allow.has(tool)) throw new Stop("tool_denied:" + tool);
const fn = this.impls[tool];
if (!fn) throw new Stop("tool_missing:" + tool);
return fn(args);
}
}
export function runFramework(orchestrationFn, { budgets, tools }) {
const started = Date.now();
for (let step = 0; step < budgets.maxSteps; step++) {
if ((Date.now() - started) / 1000 > budgets.maxSeconds) return { status: "stopped", stop_reason: "max_seconds" };
try {
const out = orchestrationFn({ step, tools }); // (pseudo)
if (out && out.done) return { status: "ok", result: out.result };
} catch (e) {
if (e instanceof Stop) return { status: "stopped", stop_reason: e.reason };
throw e;
}
}
return { status: "stopped", stop_reason: "max_steps" };
}Incidente real (con números)
We saw a multi-agent system shipped for “support triage”. It was role-based, and it looked great in a demo.
In production:
- one role started “double-checking” by re-searching
- another role waited for the first role’s output
Impact over a day:
- tool calls/run: 6 → 24
- p95 latency: 4.1s → 21.6s
- spend: +$530 vs baseline
- on-call time: ~2 hours to identify that the issue was “agent coordination”, not an external outage
Fix:
- explicit step limits + repeat detection
- tool gateway dedupe for repeated search calls
- degrade mode during search instability
The framework wasn’t the villain. Lack of control was.
Ruta de migración (A → B)
CrewAI → LangGraph (common path)
- log real runs and identify the “happy path”
- encode that path as explicit graph states
- keep a bounded “agentic” branch for edge cases
- keep the same tool gateway + budgets (don’t rewrite governance)
LangGraph → CrewAI (when roles matter)
- keep the graph as the orchestrator
- swap specific nodes to call “role agents”
- enforce budgets and stop reasons at the outer loop
Guía de decisión
- If you need explicit state and replay → pick LangGraph-style graphs.
- If you need collaboration patterns (reviewer/critic/planner) → CrewAI can fit, but budget it hard.
- If you’re early and under-instrumented → pick the approach that’s easiest to test and trace.
Trade-offs
- Multi-agent can improve quality on complex tasks, but increases coordination failures.
- Graphs improve debuggability, but the state machine becomes real code you must maintain.
- Either way, the control layer is non-optional in production.
Cuándo NO usarlo
- Don’t ship multi-agent without timeouts, leases, and stop reasons.
- Don’t build graphs that “just call the model to decide everything” — you lose the point of a graph.
- Don’t pick a framework first. Pick the failure modes you can tolerate.
Checklist (copiar/pegar)
- [ ] Keep governance framework-agnostic (budgets + tool gateway)
- [ ] Add stop reasons and surface them to users
- [ ] Add repeat detection + tool dedupe
- [ ] Start read-only; gate writes behind approvals
- [ ] Canary changes; expect drift
- [ ] Test replay on golden tasks
Config segura por defecto (JSON/YAML)
budgets:
max_steps: 40
max_tool_calls: 20
max_seconds: 120
tools:
allow: ["search.read", "kb.read", "http.get"]
writes:
require_approval: true
monitoring:
track: ["tool_calls_per_run", "latency_p95", "stop_reason"]
FAQ (3–5)
Usado por patrones
Fallos relacionados
- AI Agent Infinite Loop (Detectar + arreglar, con código)
- Explosión de presupuesto (cuando un agente quema dinero) + fixes + código
- Tool Spam Loops (fallo del agente + fixes + código)
- Incidentes de exceso de tokens (prompt bloat) + fixes + código
- Corrupción de respuestas de tools (schema drift + truncation) + código
Gobernanza requerida
Q: Is multi-agent always better?
A: No. It can improve quality, but it increases coordination failures. You pay for it in observability and governance.
Q: Are graphs only for workflows?
A: No. Graphs can orchestrate agents too. The value is explicit state and testability.
Q: What’s the first guardrail to add?
A: Budgets (steps/tool calls/time) and a tool gateway with a default-deny allowlist.
Q: Can we migrate without rewriting everything?
A: Yes if you keep governance outside the framework: budget guard + tool gateway + logging.
Páginas relacionadas (3–6 links)
- Foundations: Planning vs reactive agents · How agents use tools
- Failure: Deadlocks in multi-agent systems · Tool spam loops
- Governance: Budget controls · Tool permissions
- Production stack: Production agent stack