OpenAI Agents vs agents custom (comparatif production) + code

  • Choisis sans te faire piéger par la démo.
  • Vois ce qui casse en prod (ops, coût, drift).
  • Obtiens un chemin de migration + une checklist.
  • Pars avec des defaults : budgets, validation, stop reasons.
Framework ‘agents’ géré vs implémentation custom. En prod : contrôle, observabilité, portabilité et capacité à stopper vite quand ça part en vrille.
Sur cette page
  1. Le problème (côté prod)
  2. Décision rapide (qui choisit quoi)
  3. Pourquoi on choisit mal en prod
  4. 1) They assume “managed” means “safe”
  5. 2) They assume “custom” means “more control”
  6. 3) They lock into the wrong abstraction
  7. Tableau comparatif
  8. Où ça casse en prod
  9. Managed breaks when:
  10. Custom breaks when:
  11. Exemple d’implémentation (code réel)
  12. Incident réel (avec chiffres)
  13. Chemin de migration (A → B)
  14. Managed → custom (common)
  15. Custom → managed (when you want speed)
  16. Guide de décision
  17. Compromis
  18. Quand NE PAS l’utiliser
  19. Checklist (copier-coller)
  20. Config par défaut sûre (JSON/YAML)
  21. FAQ (3–5)
  22. Pages liées (3–6 liens)

Le problème (côté prod)

At some point you’ll ask: “Do we use a managed agent platform or build our own?”

The wrong answer is religious:

  • “never build, always buy”
  • “never buy, always build”

In production, the real question is: where do you want the control layer to live?

Because the control layer (budgets, permissions, approvals, tracing) is the thing that decides whether your agent is a product feature or an incident generator.

Décision rapide (qui choisit quoi)

  • Pick OpenAI/managed agent frameworks when speed-to-first-version matters and your risk is low-to-medium.
  • Pick custom agents when you need deep integration with internal systems, strict governance, or unusual observability requirements.
  • Many teams should do both: start managed, then pull pieces in-house as the control layer hardens.

Pourquoi on choisit mal en prod

1) They assume “managed” means “safe”

Managed doesn’t automatically mean:

  • least privilege
  • approvals for writes
  • multi-tenant isolation
  • audit logs that satisfy your compliance team

You still own safety. You just outsource some plumbing.

2) They assume “custom” means “more control”

Custom code can also mean:

  • no monitoring
  • no budgets
  • no replay

Control is a discipline, not a repo.

3) They lock into the wrong abstraction

If your architecture couples:

  • agent loop
  • tool gateway
  • observability

…you can’t migrate without a rewrite.

Keep your control layer framework-agnostic.

Tableau comparatif

| Criterion | Managed agents | Custom agents | What matters in prod | |---|---|---|---| | Time to ship | Faster | Slower | Team velocity | | Governance hooks | Varies | You decide | Can you enforce? | | Observability | Varies | You build | Debuggability | | Multi-tenant isolation | Varies | You build | Blast radius | | Flexibility | Medium | High | Tool integration | | Migration | Risky if coupled | Under your control | Avoid rewrites |

Où ça casse en prod

Managed breaks when:

  • you can’t enforce your permission model
  • you can’t get the logs you need for audits/replay
  • you need custom gating (risk tiers, per-tenant kill switches)

Custom breaks when:

  • you skip the boring parts (budgets, stop reasons, tracing)
  • you build a “framework” nobody understands
  • you add features faster than you add monitoring

Exemple d’implémentation (code réel)

This is the invariant that makes migration possible: all tools go through your gateway.

If your tool gateway is yours, you can switch agent runtimes later without changing safety.

PYTHON
from dataclasses import dataclass
from typing import Any, Callable


@dataclass(frozen=True)
class Policy:
  allow: set[str]
  require_approval: set[str]


class Denied(RuntimeError):
  pass


class ToolGateway:
  def __init__(self, *, policy: Policy, impls: dict[str, Callable[..., Any]]):
      self.policy = policy
      self.impls = impls

  def call(self, tool: str, args: dict[str, Any], *, tenant_id: str, env: str) -> Any:
      if tool not in self.policy.allow:
          raise Denied(f"not allowed: {tool}")
      if tool in self.policy.require_approval:
          token = require_human_approval(tool=tool, args=args, tenant_id=tenant_id)  # (pseudo)
          args = {**args, "approval_token": token}

      creds = load_scoped_credentials(tool=tool, tenant_id=tenant_id, env=env)  # (pseudo)
      fn = self.impls[tool]
      return fn(args=args, creds=creds)
JAVASCRIPT
export class Denied extends Error {}

export class ToolGateway {
constructor({ policy, impls }) {
  this.policy = policy;
  this.impls = impls;
}

async call({ tool, args, tenantId, env }) {
  if (!this.policy.allow.includes(tool)) throw new Denied("not allowed: " + tool);
  if (this.policy.requireApproval.includes(tool)) {
    const token = await requireHumanApproval({ tool, args, tenantId }); // (pseudo)
    args = { ...args, approval_token: token };
  }
  const creds = await loadScopedCredentials({ tool, tenantId, env }); // (pseudo)
  const fn = this.impls[tool];
  return fn({ args, creds });
}
}

Incident réel (avec chiffres)

We saw a team adopt a managed agent runtime quickly (good call). Then they exposed write tools without building an external gateway (bad call).

A prompt injection payload in a ticket steered the agent into a write path.

Impact:

  • 9 bogus tickets created
  • ~45 minutes of cleanup + trust loss
  • they disabled the agent and rewired tools behind a gateway anyway

The lesson wasn’t “managed is bad”. The lesson was “tool governance can’t be implicit”.

Chemin de migration (A → B)

Managed → custom (common)

  1. move tool calling behind your gateway first
  2. add budgets, stop reasons, monitoring outside the runtime
  3. replay traces to validate parity
  4. swap the runtime once governance is stable

Custom → managed (when you want speed)

  1. keep your gateway and logging
  2. use managed runtime for orchestration/model calls
  3. keep kill switch and approvals outside

Guide de décision

  • If you can’t operate it without deep hooks → custom.
  • If speed matters and tools are low-risk → managed is fine.
  • If you’re multi-tenant with writes → gateway-first, regardless of runtime.

Compromis

  • Managed can reduce engineering load, but may constrain governance hooks.
  • Custom can do anything, including shipping without monitoring.
  • Migration is easier if governance is externalized.

Quand NE PAS l’utiliser

  • Don’t pick managed if you can’t get audit logs or enforce permissions.
  • Don’t pick custom as an excuse to skip governance.
  • Don’t couple your tool calls directly to the model runtime.

Checklist (copier-coller)

  • [ ] Tool gateway you own (policy + approvals)
  • [ ] Budgets: steps/time/tool calls/USD
  • [ ] Stop reasons returned to users
  • [ ] Tracing: run_id/step_id + tool logs
  • [ ] Canary changes; expect drift
  • [ ] Replay traces for migrations

Config par défaut sûre (JSON/YAML)

YAML
architecture:
  tool_gateway: "owned"
  runtime: "managed_or_custom"
budgets:
  max_steps: 25
  max_tool_calls: 12
approvals:
  required_for: ["db.write", "email.send", "ticket.close"]
logging:
  tool_calls: true
  stop_reasons: true

FAQ (3–5)

Will managed agents solve reliability for us?
Not automatically. Reliability comes from budgets, tool policy, observability, and safe failure paths.
What’s the non-negotiable piece to own?
The tool gateway and governance. That’s where side effects are controlled.
How do we avoid lock-in?
Keep tools, budgets, and logging outside the runtime. Then swapping runtimes is a controlled change.
What’s the first production control to add?
Default-deny tool allowlist + step/tool budgets + stop reasons.

Q: Will managed agents solve reliability for us?
A: Not automatically. Reliability comes from budgets, tool policy, observability, and safe failure paths.

Q: What’s the non-negotiable piece to own?
A: The tool gateway and governance. That’s where side effects are controlled.

Q: How do we avoid lock-in?
A: Keep tools, budgets, and logging outside the runtime. Then swapping runtimes is a controlled change.

Q: What’s the first production control to add?
A: Default-deny tool allowlist + step/tool budgets + stop reasons.

Pages liées (3–6 liens)

Pas sur que ce soit votre cas ?

Concevez votre agent ->
⏱️ 6 min de lectureMis à jour Mars, 2026Difficulté: ★★☆
Intégré : contrôle en productionOnceOnly
Ajoutez des garde-fous aux agents tool-calling
Livrez ce pattern avec de la gouvernance :
  • Budgets (steps / plafonds de coût)
  • Permissions outils (allowlist / blocklist)
  • Kill switch & arrêt incident
  • Idempotence & déduplication
  • Audit logs & traçabilité
Mention intégrée : OnceOnly est une couche de contrôle pour des systèmes d’agents en prod.
Auteur

Cette documentation est organisée et maintenue par des ingénieurs qui déploient des agents IA en production.

Le contenu est assisté par l’IA, avec une responsabilité éditoriale humaine quant à l’exactitude, la clarté et la pertinence en production.

Les patterns et recommandations s’appuient sur des post-mortems, des modes de défaillance et des incidents opérationnels dans des systèmes déployés, notamment lors du développement et de l’exploitation d’une infrastructure de gouvernance pour les agents chez OnceOnly.