OpenAI Agents vs Custom Agents (Production-Vergleich) + Code

  • Wähle richtig, ohne Demo-getriebene Reue.
  • Sieh, was in Prod bricht (Ops, Kosten, Drift).
  • Migration + Entscheidungs-Checkliste bekommen.
  • Defaults mitnehmen: Budgets, Validation, Stop-Reasons.
Managed Agent Runtimes bringen dich schnell zum ersten Ergebnis. Custom Agents geben dir Kontrolle. Produktionsvergleich: Governance, Observability, Failure Handling und ein realistischer Migrationspfad.
Auf dieser Seite
  1. Problem (aus der Praxis)
  2. Schnelle Entscheidung (wer sollte was wählen)
  3. Warum man in Prod die falsche Wahl trifft
  4. 1) They assume “managed” means “safe”
  5. 2) They assume “custom” means “more control”
  6. 3) They lock into the wrong abstraction
  7. Vergleichstabelle
  8. Wo das in Production bricht
  9. Managed breaks when:
  10. Custom breaks when:
  11. Implementierungsbeispiel (echter Code)
  12. Echter Incident (mit Zahlen)
  13. Migrationspfad (A → B)
  14. Managed → custom (common)
  15. Custom → managed (when you want speed)
  16. Entscheidungshilfe
  17. Abwägungen
  18. Wann du es NICHT nutzen solltest
  19. Checkliste (Copy/Paste)
  20. Sicheres Default-Config-Snippet (JSON/YAML)
  21. FAQ (3–5)
  22. Verwandte Seiten (3–6 Links)

Problem (aus der Praxis)

At some point you’ll ask: “Do we use a managed agent platform or build our own?”

The wrong answer is religious:

  • “never build, always buy”
  • “never buy, always build”

In production, the real question is: where do you want the control layer to live?

Because the control layer (budgets, permissions, approvals, tracing) is the thing that decides whether your agent is a product feature or an incident generator.

Schnelle Entscheidung (wer sollte was wählen)

  • Pick OpenAI/managed agent frameworks when speed-to-first-version matters and your risk is low-to-medium.
  • Pick custom agents when you need deep integration with internal systems, strict governance, or unusual observability requirements.
  • Many teams should do both: start managed, then pull pieces in-house as the control layer hardens.

Warum man in Prod die falsche Wahl trifft

1) They assume “managed” means “safe”

Managed doesn’t automatically mean:

  • least privilege
  • approvals for writes
  • multi-tenant isolation
  • audit logs that satisfy your compliance team

You still own safety. You just outsource some plumbing.

2) They assume “custom” means “more control”

Custom code can also mean:

  • no monitoring
  • no budgets
  • no replay

Control is a discipline, not a repo.

3) They lock into the wrong abstraction

If your architecture couples:

  • agent loop
  • tool gateway
  • observability

…you can’t migrate without a rewrite.

Keep your control layer framework-agnostic.

Vergleichstabelle

| Criterion | Managed agents | Custom agents | What matters in prod | |---|---|---|---| | Time to ship | Faster | Slower | Team velocity | | Governance hooks | Varies | You decide | Can you enforce? | | Observability | Varies | You build | Debuggability | | Multi-tenant isolation | Varies | You build | Blast radius | | Flexibility | Medium | High | Tool integration | | Migration | Risky if coupled | Under your control | Avoid rewrites |

Wo das in Production bricht

Managed breaks when:

  • you can’t enforce your permission model
  • you can’t get the logs you need for audits/replay
  • you need custom gating (risk tiers, per-tenant kill switches)

Custom breaks when:

  • you skip the boring parts (budgets, stop reasons, tracing)
  • you build a “framework” nobody understands
  • you add features faster than you add monitoring

Implementierungsbeispiel (echter Code)

This is the invariant that makes migration possible: all tools go through your gateway.

If your tool gateway is yours, you can switch agent runtimes later without changing safety.

PYTHON
from dataclasses import dataclass
from typing import Any, Callable


@dataclass(frozen=True)
class Policy:
  allow: set[str]
  require_approval: set[str]


class Denied(RuntimeError):
  pass


class ToolGateway:
  def __init__(self, *, policy: Policy, impls: dict[str, Callable[..., Any]]):
      self.policy = policy
      self.impls = impls

  def call(self, tool: str, args: dict[str, Any], *, tenant_id: str, env: str) -> Any:
      if tool not in self.policy.allow:
          raise Denied(f"not allowed: {tool}")
      if tool in self.policy.require_approval:
          token = require_human_approval(tool=tool, args=args, tenant_id=tenant_id)  # (pseudo)
          args = {**args, "approval_token": token}

      creds = load_scoped_credentials(tool=tool, tenant_id=tenant_id, env=env)  # (pseudo)
      fn = self.impls[tool]
      return fn(args=args, creds=creds)
JAVASCRIPT
export class Denied extends Error {}

export class ToolGateway {
constructor({ policy, impls }) {
  this.policy = policy;
  this.impls = impls;
}

async call({ tool, args, tenantId, env }) {
  if (!this.policy.allow.includes(tool)) throw new Denied("not allowed: " + tool);
  if (this.policy.requireApproval.includes(tool)) {
    const token = await requireHumanApproval({ tool, args, tenantId }); // (pseudo)
    args = { ...args, approval_token: token };
  }
  const creds = await loadScopedCredentials({ tool, tenantId, env }); // (pseudo)
  const fn = this.impls[tool];
  return fn({ args, creds });
}
}

Echter Incident (mit Zahlen)

We saw a team adopt a managed agent runtime quickly (good call). Then they exposed write tools without building an external gateway (bad call).

A prompt injection payload in a ticket steered the agent into a write path.

Impact:

  • 9 bogus tickets created
  • ~45 minutes of cleanup + trust loss
  • they disabled the agent and rewired tools behind a gateway anyway

The lesson wasn’t “managed is bad”. The lesson was “tool governance can’t be implicit”.

Migrationspfad (A → B)

Managed → custom (common)

  1. move tool calling behind your gateway first
  2. add budgets, stop reasons, monitoring outside the runtime
  3. replay traces to validate parity
  4. swap the runtime once governance is stable

Custom → managed (when you want speed)

  1. keep your gateway and logging
  2. use managed runtime for orchestration/model calls
  3. keep kill switch and approvals outside

Entscheidungshilfe

  • If you can’t operate it without deep hooks → custom.
  • If speed matters and tools are low-risk → managed is fine.
  • If you’re multi-tenant with writes → gateway-first, regardless of runtime.

Abwägungen

  • Managed can reduce engineering load, but may constrain governance hooks.
  • Custom can do anything, including shipping without monitoring.
  • Migration is easier if governance is externalized.

Wann du es NICHT nutzen solltest

  • Don’t pick managed if you can’t get audit logs or enforce permissions.
  • Don’t pick custom as an excuse to skip governance.
  • Don’t couple your tool calls directly to the model runtime.

Checkliste (Copy/Paste)

  • [ ] Tool gateway you own (policy + approvals)
  • [ ] Budgets: steps/time/tool calls/USD
  • [ ] Stop reasons returned to users
  • [ ] Tracing: run_id/step_id + tool logs
  • [ ] Canary changes; expect drift
  • [ ] Replay traces for migrations

Sicheres Default-Config-Snippet (JSON/YAML)

YAML
architecture:
  tool_gateway: "owned"
  runtime: "managed_or_custom"
budgets:
  max_steps: 25
  max_tool_calls: 12
approvals:
  required_for: ["db.write", "email.send", "ticket.close"]
logging:
  tool_calls: true
  stop_reasons: true

FAQ (3–5)

Will managed agents solve reliability for us?
Not automatically. Reliability comes from budgets, tool policy, observability, and safe failure paths.
What’s the non-negotiable piece to own?
The tool gateway and governance. That’s where side effects are controlled.
How do we avoid lock-in?
Keep tools, budgets, and logging outside the runtime. Then swapping runtimes is a controlled change.
What’s the first production control to add?
Default-deny tool allowlist + step/tool budgets + stop reasons.

Q: Will managed agents solve reliability for us?
A: Not automatically. Reliability comes from budgets, tool policy, observability, and safe failure paths.

Q: What’s the non-negotiable piece to own?
A: The tool gateway and governance. That’s where side effects are controlled.

Q: How do we avoid lock-in?
A: Keep tools, budgets, and logging outside the runtime. Then swapping runtimes is a controlled change.

Q: What’s the first production control to add?
A: Default-deny tool allowlist + step/tool budgets + stop reasons.

Nicht sicher, ob das dein Fall ist?

Agent gestalten ->
⏱️ 6 Min. LesezeitAktualisiert Mär, 2026Schwierigkeit: ★★☆
Integriert: Production ControlOnceOnly
Guardrails für Tool-Calling-Agents
Shippe dieses Pattern mit Governance:
  • Budgets (Steps / Spend Caps)
  • Tool-Permissions (Allowlist / Blocklist)
  • Kill switch & Incident Stop
  • Idempotenz & Dedupe
  • Audit logs & Nachvollziehbarkeit
Integrierter Hinweis: OnceOnly ist eine Control-Layer für Production-Agent-Systeme.
Autor

Diese Dokumentation wird von Engineers kuratiert und gepflegt, die AI-Agenten in der Produktion betreiben.

Die Inhalte sind KI-gestützt, mit menschlicher redaktioneller Verantwortung für Genauigkeit, Klarheit und Produktionsrelevanz.

Patterns und Empfehlungen basieren auf Post-Mortems, Failure-Modes und operativen Incidents in produktiven Systemen, auch bei der Entwicklung und dem Betrieb von Governance-Infrastruktur für Agenten bei OnceOnly.