Le problème (côté prod)
At some point you’ll hit the same production problem: the model output shape matters more than the model prose.
If you’re calling tools, parsing JSON, and triggering side effects, you need:
- schema validation
- invariants
- fail-closed behavior
That’s where “typed agent frameworks” are attractive. And where “flexible agent frameworks” can either help or hurt, depending on how much discipline your team has.
Décision rapide (qui choisit quoi)
- Pick PydanticAI-style typed outputs if your system is tool-heavy and you want validation to be the default, not an afterthought.
- Pick LangChain agents if you need flexibility across integrations and you’re willing to enforce schemas and governance yourself.
- If you don’t validate outputs, it doesn’t matter which you pick — you’ll ship silent failures.
Pourquoi on choisit mal en prod
1) They think a framework replaces governance
No framework replaces:
- budgets
- tool permissions
- monitoring
- approvals for writes
2) They treat structured outputs as “nice to have”
In prod, structured outputs are how you prevent:
- tool response corruption turning into actions
- prompt injection steering tool calls
- “close enough JSON” becoming “close enough incident”
3) They over-index on integration count
“It integrates with everything” isn’t a production plan. If your tool gateway is unsafe, more integrations just means more blast radius.
Tableau comparatif
| Criterion | PydanticAI (typed-first) | LangChain agents (flexible) | What matters in prod | |---|---|---|---| | Default output validation | Strong | Depends on you | Fail closed | | Integration surface | Smaller | Larger | Blast radius | | Debuggability | Better if typed | Better if instrumented | Traces | | Failure handling | Explicit if enforced | Emergent if loose | Stop reasons | | Best for | Tool-heavy systems | Rapid integration | Team discipline |
Où ça casse en prod
Typed-first breaks
- you still have to maintain schemas
- you can over-constrain and reject useful outputs
- teams misuse typing as “security” (it isn’t)
Flexible breaks
- silent parse errors
- “best effort” JSON coercion
- tool outputs treated as instructions
- drift changes output shapes without tests
Exemple d’implémentation (code réel)
No matter what framework you use, put a strict validator between the model and side effects.
This shows a minimal typed decision object with fail-closed parsing.
from dataclasses import dataclass
from typing import Any
@dataclass(frozen=True)
class Decision:
kind: str # "final" | "tool"
tool: str | None
args: dict[str, Any] | None
answer: str | None
class InvalidDecision(RuntimeError):
pass
def validate_decision(obj: Any) -> Decision:
if not isinstance(obj, dict):
raise InvalidDecision("expected object")
kind = obj.get("kind")
if kind not in {"final", "tool"}:
raise InvalidDecision("invalid kind")
if kind == "final":
ans = obj.get("answer")
if not isinstance(ans, str) or not ans.strip():
raise InvalidDecision("missing answer")
return Decision(kind="final", tool=None, args=None, answer=ans)
tool = obj.get("tool")
args = obj.get("args")
if not isinstance(tool, str):
raise InvalidDecision("missing tool")
if not isinstance(args, dict):
raise InvalidDecision("missing args")
return Decision(kind="tool", tool=tool, args=args, answer=None)export class InvalidDecision extends Error {}
export function validateDecision(obj) {
if (!obj || typeof obj !== "object") throw new InvalidDecision("expected object");
const kind = obj.kind;
if (kind !== "final" && kind !== "tool") throw new InvalidDecision("invalid kind");
if (kind === "final") {
if (typeof obj.answer !== "string" || !obj.answer.trim()) throw new InvalidDecision("missing answer");
return { kind: "final", answer: obj.answer };
}
if (typeof obj.tool !== "string") throw new InvalidDecision("missing tool");
if (!obj.args || typeof obj.args !== "object") throw new InvalidDecision("missing args");
return { kind: "tool", tool: obj.tool, args: obj.args };
}Incident réel (avec chiffres)
We saw a team ship a flexible agent that parsed “tool calls” with best-effort JSON extraction.
During a partial outage, tool output included an HTML error page. The model copied part of it into the “args”. The parser coerced it into a dict.
Impact:
- 17 runs wrote garbage data into a queue
- downstream workers crashed for ~25 minutes
- on-call spent ~2 hours tracing the root cause because logs only had the final answer
Fix:
- strict parsing + schema validation for decisions and tool outputs
- fail closed before any write
- monitoring for
invalid_decision_rate
Typed outputs didn’t solve this alone — strict validation did.
Chemin de migration (A → B)
Flexible → typed-first
- add schema validation at the boundary (model output + tool output)
- define a small decision schema (tool vs final)
- gradually type the high-risk parts (writes) first
Typed-first → flexible (when you need it)
- keep typed boundaries for actions and tools
- allow free-form text only inside “analysis” fields that never trigger side effects
Guide de décision
- If your system does writes → prioritize typed/validated boundaries.
- If you’re doing experiments → flexibility is fine, but keep budgets and logging.
- If you’re multi-tenant → strict validation is non-negotiable.
Compromis
- Validation rejects some outputs. That’s good. It forces you to handle the failure path.
- Typing adds maintenance overhead.
- Flexibility can ship faster, but it ships more production surprises too.
Quand NE PAS l’utiliser
- Don’t rely on typing as security. You still need permissions and approvals.
- Don’t use best-effort parsing for tool calls that trigger writes.
- Don’t skip monitoring. Validation failures are a metric, not a shame.
Checklist (copier-coller)
- [ ] Validate model decisions (schema) before acting
- [ ] Validate tool outputs (schema + invariants)
- [ ] Fail closed for writes
- [ ] Budgets + stop reasons
- [ ] Audit logs for tool calls
- [ ] Canary changes; drift is real
Config par défaut sûre (JSON/YAML)
validation:
model_decision:
fail_closed: true
schema: "Decision(kind, tool?, args?, answer?)"
tool_output:
fail_closed: true
max_chars: 200000
budgets:
max_steps: 25
max_tool_calls: 12
monitoring:
track: ["invalid_decision_rate", "tool_output_invalid_rate", "stop_reason"]
FAQ (3–5)
Utilisé par les patterns
Pannes associées
Gouvernance requise
Q: Does typing guarantee correctness?
A: No. It guarantees shape. You still need invariants, permissions, and safe-mode behavior.
Q: Is LangChain ‘unsafe’?
A: No. It’s flexible. Safety comes from how you enforce boundaries: budgets, validation, and a tool gateway.
Q: What should we type first?
A: Anything that triggers writes or money: tool calls, approvals, budget policy outputs.
Q: Can strict validation hurt completion rate?
A: Yes. That’s usually the point: stop guessing and handle failure paths explicitly.
Pages liées (3–6 liens)
- Foundations: How agents use tools · How LLM limits affect agents
- Failure: Tool response corruption · Silent agent drift
- Governance: Tool permissions · Budget controls
- Production stack: Production agent stack