Le problème (côté prod)
AutoGPT is the archetype of “let it run”. LangGraph is the archetype of “make the loop explicit”.
In production, those two philosophies matter more than library APIs. One optimizes for autonomy. The other optimizes for control.
If you’re shipping to real users with real budgets, you should bias toward control until you’ve earned autonomy.
Décision rapide (qui choisit quoi)
- Pick LangGraph if you need replay, testing, and explicit stop reasons. It’s the safer default for production systems.
- Pick AutoGPT-style autonomy only when you can tolerate failures and you’ve built budgets, monitoring, and kill switches first.
- If you’re multi-tenant and write-capable, don’t start with “let it run”.
Pourquoi on choisit mal en prod
1) They overvalue autonomy early
Early on, autonomy looks like progress. In prod, autonomy without governance looks like:
- tool spam
- budget explosions
- partial outages amplified
2) They underestimate “boring code”
Explicit flows feel less “AI”. They’re also the thing you can debug at 3 AM.
3) They skip the control layer
If you don’t have:
- budgets
- tool permissions
- validation
- stop reasons
…your framework choice won’t save you.
Tableau comparatif
| Criterion | LangGraph-style explicit flow | AutoGPT-style autonomy | What matters in prod | |---|---|---|---| | Control | High | Low/medium | Stop runaway loops | | Debuggability | High | Low | Replay + traces | | Cost predictability | Better | Worse | Spend spikes | | Failure amplification | Lower | Higher | Outage containment | | Best for | Production apps | Experiments / sandboxes | Risk tolerance |
Où ça casse en prod
Autonomy breaks
- it keeps trying because “one more try” looks rational
- it retries across layers (agent + tool + http client)
- it explores tool space you forgot to constrain
Explicit flows break
- you ship a big state machine without tests
- you still don’t validate tool outputs, so “explicit” becomes “explicitly wrong”
- you encode too much in prompts and too little in code
Exemple d’implémentation (code réel)
If you want autonomy, you need to sandbox it.
This guardrail pattern:
- caps steps/time/tool calls
- forces a stop reason
- disables writes by default
from dataclasses import dataclass
from typing import Any
import time
@dataclass(frozen=True)
class Budgets:
max_steps: int = 30
max_seconds: int = 90
max_tool_calls: int = 15
class Stop(RuntimeError):
def __init__(self, reason: str):
super().__init__(reason)
self.reason = reason
class GuardedTools:
def __init__(self, *, allow: set[str]):
self.allow = allow
self.calls = 0
def call(self, tool: str, args: dict[str, Any], *, budgets: Budgets) -> Any:
self.calls += 1
if self.calls > budgets.max_tool_calls:
raise Stop("max_tool_calls")
if tool not in self.allow:
raise Stop(f"tool_denied:{tool}")
return tool_impl(tool, args=args) # (pseudo)
def run_autonomy(task: str, *, budgets: Budgets) -> dict[str, Any]:
tools = GuardedTools(allow={"search.read", "kb.read", "http.get"})
started = time.time()
for _ in range(budgets.max_steps):
if time.time() - started > budgets.max_seconds:
return {"status": "stopped", "stop_reason": "max_seconds"}
action = llm_decide(task) # (pseudo)
if action.kind == "final":
return {"status": "ok", "answer": action.final_answer}
try:
obs = tools.call(action.name, action.args, budgets=budgets)
except Stop as e:
return {"status": "stopped", "stop_reason": e.reason, "partial": "Stopped safely."}
task = update(task, action, obs) # (pseudo)
return {"status": "stopped", "stop_reason": "max_steps"}export class Stop extends Error {
constructor(reason) {
super(reason);
this.reason = reason;
}
}
export class GuardedTools {
constructor({ allow = [] } = {}) {
this.allow = new Set(allow);
this.calls = 0;
}
call(tool, args, { budgets }) {
this.calls += 1;
if (this.calls > budgets.maxToolCalls) throw new Stop("max_tool_calls");
if (!this.allow.has(tool)) throw new Stop("tool_denied:" + tool);
return toolImpl(tool, { args }); // (pseudo)
}
}Incident réel (avec chiffres)
We saw an “autonomous research agent” shipped without strict budgets. It kept searching until it “felt confident”.
Impact:
- one run lasted ~17 minutes
- tool calls: ~140
- spend: ~$74 (browser + model calls)
- users retried because the UI looked “stuck”, multiplying cost
Fix:
- explicit budgets (steps/time/tool calls/USD)
- degrade mode when search is unstable
- stop reasons surfaced to users
Autonomy didn’t fail because it was “too ambitious”. It failed because it had no brakes.
Chemin de migration (A → B)
AutoGPT → LangGraph-style control
- instrument runs (tool calls, tokens, stop reasons)
- identify the common path and encode it explicitly
- keep a bounded autonomous branch for unknowns
- gate writes behind approvals
LangGraph → more autonomy (when you’re ready)
- keep explicit states for risky transitions
- allow autonomy only inside bounded “investigation” nodes
- canary changes and watch drift
Guide de décision
- If you need predictable behavior → explicit flow.
- If you need exploration, but can cap it hard → bounded autonomy.
- If you can’t monitor spend and tool calls → don’t ship autonomy.
Compromis
- Explicit flows require more engineering upfront.
- Autonomy can solve weird tasks, but increases operational risk.
- Hybrid is usually the sweet spot.
Quand NE PAS l’utiliser
- Don’t use autonomy with write tools in multi-tenant prod.
- Don’t use explicit graphs as an excuse to skip validation/monitoring.
- Don’t pick a framework to avoid making governance decisions.
Checklist (copier-coller)
- [ ] Start with explicit flow for the happy path
- [ ] Bound autonomy inside strict budgets
- [ ] Default-deny tools; read-only first
- [ ] Stop reasons returned to UI
- [ ] Monitor tool_calls/run and spend/run
- [ ] Kill switch that disables writes and expensive tools
Config par défaut sûre (JSON/YAML)
mode:
default: "explicit_flow"
autonomy:
allowed_for: ["investigation_nodes"]
budgets:
max_steps: 30
max_seconds: 90
max_tool_calls: 15
tools:
allow: ["search.read", "kb.read", "http.get"]
writes:
require_approval: true
FAQ (3–5)
Utilisé par les patterns
Pannes associées
Gouvernance requise
Q: Is AutoGPT inherently ‘bad’?
A: No. It’s a useful model for autonomy. But production needs governance. Without it, autonomy turns into spend and outages.
Q: Do graphs guarantee correctness?
A: No. They guarantee structure. You still need validation and guardrails.
Q: What’s the first production metric?
A: Tool calls/run. It moves early when autonomy starts thrashing.
Q: Can we keep autonomy but be safe?
A: Yes: bound it. Budgets, tool allowlists, and stop reasons are the minimum.
Pages liées (3–6 liens)
- Foundations: Workflow vs agent · Planning vs reactive agents
- Failure: Tool spam loops · Budget explosion
- Governance: Budget controls · Step limits
- Production stack: Production agent stack