PydanticAI vs LangChain Agents (Production Comparison) + Code

  • Pick the right tool without demo-driven regret.
  • See what breaks in production (operability, cost, drift).
  • Get a migration path and decision checklist.
  • Leave with defaults: budgets, validation, stop reasons.
Typed outputs vs flexible abstractions. Where each helps, where each hides failure modes, and what you need for production: validation, budgets, and observability.
On this page
  1. Problem-first intro
  2. Quick decision (who should pick what)
  3. Why people pick the wrong option in production
  4. 1) They think a framework replaces governance
  5. 2) They treat structured outputs as “nice to have”
  6. 3) They over-index on integration count
  7. Comparison table
  8. Where this breaks in production
  9. Typed-first breaks
  10. Flexible breaks
  11. Implementation example (real code)
  12. Real failure case (incident-style, with numbers)
  13. Migration path (A → B)
  14. Flexible → typed-first
  15. Typed-first → flexible (when you need it)
  16. Decision guide
  17. Trade-offs
  18. When NOT to use
  19. Copy-paste checklist
  20. Safe default config snippet (JSON/YAML)
  21. FAQ (3–5)
  22. Related pages (3–6 links)

Problem-first intro

At some point you’ll hit the same production problem: the model output shape matters more than the model prose.

If you’re calling tools, parsing JSON, and triggering side effects, you need:

  • schema validation
  • invariants
  • fail-closed behavior

That’s where “typed agent frameworks” are attractive. And where “flexible agent frameworks” can either help or hurt, depending on how much discipline your team has.

Quick decision (who should pick what)

  • Pick PydanticAI-style typed outputs if your system is tool-heavy and you want validation to be the default, not an afterthought.
  • Pick LangChain agents if you need flexibility across integrations and you’re willing to enforce schemas and governance yourself.
  • If you don’t validate outputs, it doesn’t matter which you pick — you’ll ship silent failures.

Why people pick the wrong option in production

1) They think a framework replaces governance

No framework replaces:

  • budgets
  • tool permissions
  • monitoring
  • approvals for writes

2) They treat structured outputs as “nice to have”

In prod, structured outputs are how you prevent:

  • tool response corruption turning into actions
  • prompt injection steering tool calls
  • “close enough JSON” becoming “close enough incident”

3) They over-index on integration count

“It integrates with everything” isn’t a production plan. If your tool gateway is unsafe, more integrations just means more blast radius.

Comparison table

| Criterion | PydanticAI (typed-first) | LangChain agents (flexible) | What matters in prod | |---|---|---|---| | Default output validation | Strong | Depends on you | Fail closed | | Integration surface | Smaller | Larger | Blast radius | | Debuggability | Better if typed | Better if instrumented | Traces | | Failure handling | Explicit if enforced | Emergent if loose | Stop reasons | | Best for | Tool-heavy systems | Rapid integration | Team discipline |

Where this breaks in production

Typed-first breaks

  • you still have to maintain schemas
  • you can over-constrain and reject useful outputs
  • teams misuse typing as “security” (it isn’t)

Flexible breaks

  • silent parse errors
  • “best effort” JSON coercion
  • tool outputs treated as instructions
  • drift changes output shapes without tests

Implementation example (real code)

No matter what framework you use, put a strict validator between the model and side effects.

This shows a minimal typed decision object with fail-closed parsing.

PYTHON
from dataclasses import dataclass
from typing import Any


@dataclass(frozen=True)
class Decision:
  kind: str  # "final" | "tool"
  tool: str | None
  args: dict[str, Any] | None
  answer: str | None


class InvalidDecision(RuntimeError):
  pass


def validate_decision(obj: Any) -> Decision:
  if not isinstance(obj, dict):
      raise InvalidDecision("expected object")
  kind = obj.get("kind")
  if kind not in {"final", "tool"}:
      raise InvalidDecision("invalid kind")
  if kind == "final":
      ans = obj.get("answer")
      if not isinstance(ans, str) or not ans.strip():
          raise InvalidDecision("missing answer")
      return Decision(kind="final", tool=None, args=None, answer=ans)
  tool = obj.get("tool")
  args = obj.get("args")
  if not isinstance(tool, str):
      raise InvalidDecision("missing tool")
  if not isinstance(args, dict):
      raise InvalidDecision("missing args")
  return Decision(kind="tool", tool=tool, args=args, answer=None)
JAVASCRIPT
export class InvalidDecision extends Error {}

export function validateDecision(obj) {
if (!obj || typeof obj !== "object") throw new InvalidDecision("expected object");
const kind = obj.kind;
if (kind !== "final" && kind !== "tool") throw new InvalidDecision("invalid kind");

if (kind === "final") {
  if (typeof obj.answer !== "string" || !obj.answer.trim()) throw new InvalidDecision("missing answer");
  return { kind: "final", answer: obj.answer };
}

if (typeof obj.tool !== "string") throw new InvalidDecision("missing tool");
if (!obj.args || typeof obj.args !== "object") throw new InvalidDecision("missing args");
return { kind: "tool", tool: obj.tool, args: obj.args };
}

Real failure case (incident-style, with numbers)

We saw a team ship a flexible agent that parsed “tool calls” with best-effort JSON extraction.

During a partial outage, tool output included an HTML error page. The model copied part of it into the “args”. The parser coerced it into a dict.

Impact:

  • 17 runs wrote garbage data into a queue
  • downstream workers crashed for ~25 minutes
  • on-call spent ~2 hours tracing the root cause because logs only had the final answer

Fix:

  1. strict parsing + schema validation for decisions and tool outputs
  2. fail closed before any write
  3. monitoring for invalid_decision_rate

Typed outputs didn’t solve this alone — strict validation did.

Migration path (A → B)

Flexible → typed-first

  1. add schema validation at the boundary (model output + tool output)
  2. define a small decision schema (tool vs final)
  3. gradually type the high-risk parts (writes) first

Typed-first → flexible (when you need it)

  1. keep typed boundaries for actions and tools
  2. allow free-form text only inside “analysis” fields that never trigger side effects

Decision guide

  • If your system does writes → prioritize typed/validated boundaries.
  • If you’re doing experiments → flexibility is fine, but keep budgets and logging.
  • If you’re multi-tenant → strict validation is non-negotiable.

Trade-offs

  • Validation rejects some outputs. That’s good. It forces you to handle the failure path.
  • Typing adds maintenance overhead.
  • Flexibility can ship faster, but it ships more production surprises too.

When NOT to use

  • Don’t rely on typing as security. You still need permissions and approvals.
  • Don’t use best-effort parsing for tool calls that trigger writes.
  • Don’t skip monitoring. Validation failures are a metric, not a shame.

Copy-paste checklist

  • [ ] Validate model decisions (schema) before acting
  • [ ] Validate tool outputs (schema + invariants)
  • [ ] Fail closed for writes
  • [ ] Budgets + stop reasons
  • [ ] Audit logs for tool calls
  • [ ] Canary changes; drift is real

Safe default config snippet (JSON/YAML)

YAML
validation:
  model_decision:
    fail_closed: true
    schema: "Decision(kind, tool?, args?, answer?)"
  tool_output:
    fail_closed: true
    max_chars: 200000
budgets:
  max_steps: 25
  max_tool_calls: 12
monitoring:
  track: ["invalid_decision_rate", "tool_output_invalid_rate", "stop_reason"]

FAQ (3–5)

Does typing guarantee correctness?
No. It guarantees shape. You still need invariants, permissions, and safe-mode behavior.
Is LangChain ‘unsafe’?
No. It’s flexible. Safety comes from how you enforce boundaries: budgets, validation, and a tool gateway.
What should we type first?
Anything that triggers writes or money: tool calls, approvals, budget policy outputs.
Can strict validation hurt completion rate?
Yes. That’s usually the point: stop guessing and handle failure paths explicitly.

Q: Does typing guarantee correctness?
A: No. It guarantees shape. You still need invariants, permissions, and safe-mode behavior.

Q: Is LangChain ‘unsafe’?
A: No. It’s flexible. Safety comes from how you enforce boundaries: budgets, validation, and a tool gateway.

Q: What should we type first?
A: Anything that triggers writes or money: tool calls, approvals, budget policy outputs.

Q: Can strict validation hurt completion rate?
A: Yes. That’s usually the point: stop guessing and handle failure paths explicitly.

Not sure this is your use case?

Design your agent ->
⏱️ 7 min readUpdated Mar, 2026Difficulty: ★★☆
Integrated: production controlOnceOnly
Add guardrails to tool-calling agents
Ship this pattern with governance:
  • Budgets (steps / spend caps)
  • Tool permissions (allowlist / blocklist)
  • Kill switch & incident stop
  • Idempotency & dedupe
  • Audit logs & traceability
Integrated mention: OnceOnly is a control layer for production agent systems.
Author

This documentation is curated and maintained by engineers who ship AI agents in production.

The content is AI-assisted, with human editorial responsibility for accuracy, clarity, and production relevance.

Patterns and recommendations are grounded in post-mortems, failure modes, and operational incidents in deployed systems, including during the development and operation of governance infrastructure for agents at OnceOnly.