The problem
Multi-tenant agents fail in predictable ways:
- one tenant’s data leaks into another tenant’s context,
- a tool call runs with the wrong tenant credentials,
- retries multiply writes because idempotency is missing,
- logs are too thin to prove what happened.
This is rarely a model problem. It’s almost always missing isolation in the runtime.
Non-negotiables
1) Bind tenant context before the agent runs
Tenant identity must come from auth and routing — not from the model.
2) Scope every tool call to the tenant
Tools must receive tenant-scoped credentials and tenant-scoped resource IDs.
3) Separate state (and caches) per tenant
Memory, artifacts, and caches must be keyed by tenant (and usually by environment).
4) Per-tenant budgets and rate limits
Budget and rate limiting must apply per tenant so one tenant can’t burn the whole system’s budget.
Diagram (tenant-scoped tool gateway)
Common failure modes
- Credential bleed: shared API keys or global clients reused across tenants.
- Cache bleed: retrieval caches keyed only by URL/query, not tenant.
- Write duplication: retries without idempotency keys.
- Silent partial writes: step N writes succeed, step N+1 fails, leaving inconsistent state.
Minimum controls to ship
- Default-deny tool allowlists, scoped per tenant and environment.
- Idempotency keys for all writes.
- Per-tenant budgets (steps/seconds/$) + per-tenant rate limits.
- Full traces with tenant_id + stop_reason on every run.