OpenAI Agents vs agentes custom (comparación para producción) + código

  • Elige bien sin arrepentirte por la demo.
  • Ve qué se rompe en prod (ops, coste, drift).
  • Consigue ruta de migración + checklist.
  • Sal con defaults: budgets, validación, stop reasons.
Framework/servicio de agentes vs runtime propio. En prod: control, observabilidad, portabilidad y capacidad de parar rápido cuando se va de manos.
En esta página
  1. El problema (en producción)
  2. Decisión rápida (quién debería elegir qué)
  3. Por qué se elige mal en producción
  4. 1) They assume “managed” means “safe”
  5. 2) They assume “custom” means “more control”
  6. 3) They lock into the wrong abstraction
  7. Tabla comparativa
  8. Dónde se rompe en producción
  9. Managed breaks when:
  10. Custom breaks when:
  11. Ejemplo de implementación (código real)
  12. Incidente real (con números)
  13. Ruta de migración (A → B)
  14. Managed → custom (common)
  15. Custom → managed (when you want speed)
  16. Guía de decisión
  17. Trade-offs
  18. Cuándo NO usarlo
  19. Checklist (copiar/pegar)
  20. Config segura por defecto (JSON/YAML)
  21. FAQ (3–5)
  22. Páginas relacionadas (3–6 links)

El problema (en producción)

At some point you’ll ask: “Do we use a managed agent platform or build our own?”

The wrong answer is religious:

  • “never build, always buy”
  • “never buy, always build”

In production, the real question is: where do you want the control layer to live?

Because the control layer (budgets, permissions, approvals, tracing) is the thing that decides whether your agent is a product feature or an incident generator.

Decisión rápida (quién debería elegir qué)

  • Pick OpenAI/managed agent frameworks when speed-to-first-version matters and your risk is low-to-medium.
  • Pick custom agents when you need deep integration with internal systems, strict governance, or unusual observability requirements.
  • Many teams should do both: start managed, then pull pieces in-house as the control layer hardens.

Por qué se elige mal en producción

1) They assume “managed” means “safe”

Managed doesn’t automatically mean:

  • least privilege
  • approvals for writes
  • multi-tenant isolation
  • audit logs that satisfy your compliance team

You still own safety. You just outsource some plumbing.

2) They assume “custom” means “more control”

Custom code can also mean:

  • no monitoring
  • no budgets
  • no replay

Control is a discipline, not a repo.

3) They lock into the wrong abstraction

If your architecture couples:

  • agent loop
  • tool gateway
  • observability

…you can’t migrate without a rewrite.

Keep your control layer framework-agnostic.

Tabla comparativa

| Criterion | Managed agents | Custom agents | What matters in prod | |---|---|---|---| | Time to ship | Faster | Slower | Team velocity | | Governance hooks | Varies | You decide | Can you enforce? | | Observability | Varies | You build | Debuggability | | Multi-tenant isolation | Varies | You build | Blast radius | | Flexibility | Medium | High | Tool integration | | Migration | Risky if coupled | Under your control | Avoid rewrites |

Dónde se rompe en producción

Managed breaks when:

  • you can’t enforce your permission model
  • you can’t get the logs you need for audits/replay
  • you need custom gating (risk tiers, per-tenant kill switches)

Custom breaks when:

  • you skip the boring parts (budgets, stop reasons, tracing)
  • you build a “framework” nobody understands
  • you add features faster than you add monitoring

Ejemplo de implementación (código real)

This is the invariant that makes migration possible: all tools go through your gateway.

If your tool gateway is yours, you can switch agent runtimes later without changing safety.

PYTHON
from dataclasses import dataclass
from typing import Any, Callable


@dataclass(frozen=True)
class Policy:
  allow: set[str]
  require_approval: set[str]


class Denied(RuntimeError):
  pass


class ToolGateway:
  def __init__(self, *, policy: Policy, impls: dict[str, Callable[..., Any]]):
      self.policy = policy
      self.impls = impls

  def call(self, tool: str, args: dict[str, Any], *, tenant_id: str, env: str) -> Any:
      if tool not in self.policy.allow:
          raise Denied(f"not allowed: {tool}")
      if tool in self.policy.require_approval:
          token = require_human_approval(tool=tool, args=args, tenant_id=tenant_id)  # (pseudo)
          args = {**args, "approval_token": token}

      creds = load_scoped_credentials(tool=tool, tenant_id=tenant_id, env=env)  # (pseudo)
      fn = self.impls[tool]
      return fn(args=args, creds=creds)
JAVASCRIPT
export class Denied extends Error {}

export class ToolGateway {
constructor({ policy, impls }) {
  this.policy = policy;
  this.impls = impls;
}

async call({ tool, args, tenantId, env }) {
  if (!this.policy.allow.includes(tool)) throw new Denied("not allowed: " + tool);
  if (this.policy.requireApproval.includes(tool)) {
    const token = await requireHumanApproval({ tool, args, tenantId }); // (pseudo)
    args = { ...args, approval_token: token };
  }
  const creds = await loadScopedCredentials({ tool, tenantId, env }); // (pseudo)
  const fn = this.impls[tool];
  return fn({ args, creds });
}
}

Incidente real (con números)

We saw a team adopt a managed agent runtime quickly (good call). Then they exposed write tools without building an external gateway (bad call).

A prompt injection payload in a ticket steered the agent into a write path.

Impact:

  • 9 bogus tickets created
  • ~45 minutes of cleanup + trust loss
  • they disabled the agent and rewired tools behind a gateway anyway

The lesson wasn’t “managed is bad”. The lesson was “tool governance can’t be implicit”.

Ruta de migración (A → B)

Managed → custom (common)

  1. move tool calling behind your gateway first
  2. add budgets, stop reasons, monitoring outside the runtime
  3. replay traces to validate parity
  4. swap the runtime once governance is stable

Custom → managed (when you want speed)

  1. keep your gateway and logging
  2. use managed runtime for orchestration/model calls
  3. keep kill switch and approvals outside

Guía de decisión

  • If you can’t operate it without deep hooks → custom.
  • If speed matters and tools are low-risk → managed is fine.
  • If you’re multi-tenant with writes → gateway-first, regardless of runtime.

Trade-offs

  • Managed can reduce engineering load, but may constrain governance hooks.
  • Custom can do anything, including shipping without monitoring.
  • Migration is easier if governance is externalized.

Cuándo NO usarlo

  • Don’t pick managed if you can’t get audit logs or enforce permissions.
  • Don’t pick custom as an excuse to skip governance.
  • Don’t couple your tool calls directly to the model runtime.

Checklist (copiar/pegar)

  • [ ] Tool gateway you own (policy + approvals)
  • [ ] Budgets: steps/time/tool calls/USD
  • [ ] Stop reasons returned to users
  • [ ] Tracing: run_id/step_id + tool logs
  • [ ] Canary changes; expect drift
  • [ ] Replay traces for migrations

Config segura por defecto (JSON/YAML)

YAML
architecture:
  tool_gateway: "owned"
  runtime: "managed_or_custom"
budgets:
  max_steps: 25
  max_tool_calls: 12
approvals:
  required_for: ["db.write", "email.send", "ticket.close"]
logging:
  tool_calls: true
  stop_reasons: true

FAQ (3–5)

Will managed agents solve reliability for us?
Not automatically. Reliability comes from budgets, tool policy, observability, and safe failure paths.
What’s the non-negotiable piece to own?
The tool gateway and governance. That’s where side effects are controlled.
How do we avoid lock-in?
Keep tools, budgets, and logging outside the runtime. Then swapping runtimes is a controlled change.
What’s the first production control to add?
Default-deny tool allowlist + step/tool budgets + stop reasons.

Q: Will managed agents solve reliability for us?
A: Not automatically. Reliability comes from budgets, tool policy, observability, and safe failure paths.

Q: What’s the non-negotiable piece to own?
A: The tool gateway and governance. That’s where side effects are controlled.

Q: How do we avoid lock-in?
A: Keep tools, budgets, and logging outside the runtime. Then swapping runtimes is a controlled change.

Q: What’s the first production control to add?
A: Default-deny tool allowlist + step/tool budgets + stop reasons.

No sabes si este es tu caso?

Disena tu agente ->
⏱️ 6 min de lecturaActualizado Mar, 2026Dificultad: ★★☆
Integrado: control en producciónOnceOnly
Guardrails para agentes con tool-calling
Lleva este patrón a producción con gobernanza:
  • Presupuestos (pasos / topes de gasto)
  • Permisos de herramientas (allowlist / blocklist)
  • Kill switch y parada por incidente
  • Idempotencia y dedupe
  • Audit logs y trazabilidad
Mención integrada: OnceOnly es una capa de control para sistemas de agentes en producción.
Autor

Esta documentación está curada y mantenida por ingenieros que despliegan agentes de IA en producción.

El contenido es asistido por IA, con responsabilidad editorial humana sobre la exactitud, la claridad y la relevancia en producción.

Los patrones y las recomendaciones se basan en post-mortems, modos de fallo e incidentes operativos en sistemas desplegados, incluido durante el desarrollo y la operación de infraestructura de gobernanza para agentes en OnceOnly.