Pattern essence
Supervisor Agent is a pattern where a separate agent controls execution: checks proposed actions, applies rules, and decides whether execution can continue.
When to use it: when critical actions must be separately approved by policy before continuing.
Instead of unconditionally trusting the worker, Supervisor:
- checks every critical action
- compares it with policies
- returns a decision:
approve,revise,block, orescalate - logs the reason for the decision

Problem
Imagine a worker agent is running a task in production and has access to tools.
It proposes a technically valid action that violates policy:
- email to the wrong audience
- SQL query with data-change risk
- spending above budget
- access to sensitive data without required access scope
A technically possible step is not always allowed or safe for the business.
Without separate control, this may go directly into execution.
Consequences:
- incidents in production
- security and compliance violations
- unexpected financial losses
- weak audit and difficult postmortem analysis
That is the problem: a worker can propose an action that "works technically", but is unacceptable by policy.
Solution
Supervisor adds a policy-control layer between action proposal and execution.
Analogy: this is like technical review before production release. The worker proposes a step, and the supervisor checks whether it is safe and allowed. Only after that can the action be executed.
Key principle: first supervisor verification and decision, then execution.
The worker proposes an action, and supervisor-policy returns:
approvereviseblockescalate
Controlled process:
- Observe: get action proposal from worker
- Evaluate: check against policy + execution runtime state
- Decide:
approve/revise/block/escalate - Enforce: execute, return for revision, or stop
- Log: record decision and reason
This gives:
- lower risk of unsafe actions before execution
- no way to bypass policy
- controlled escalation to a human
- transparent audit of decisions
Works well if:
- worker has no direct bypass of the execution layer
- policy checks intent + runtime context
- supervisor decisions are actually enforced
- high-risk actions are not auto-approved
The model may "want" to execute immediately, but supervisor-policy is what determines whether the action is allowed at all.
How it works
Supervisor does not execute the business task itself.
It controls whether the next step can be executed by checking:
- safety
- budget
- permissions
- compliance
- stop conditions
Full flow description: Observe β Evaluate β Decide β Enforce
Observe
Supervisor receives a plan or action from Worker Agent.
Evaluate
Compares the action with policies and current state: spending limit, tool type, risk level, data sensitivity.
Decide
Returns one decision: approve, revise, block, or escalate.
Enforce
The system enforces the decision: executes action, returns it for revision, stops execution, or sends for human approval.
In code it looks like this
proposal = worker.next_action(context)
decision = supervisor.review(
action=proposal,
budget_state=budget_state,
policy=policy,
)
if decision.type == "approve":
result = execute(proposal)
context.append(result)
elif decision.type == "revise":
context.append(f"Supervisor feedback: {decision.reason}")
elif decision.type == "escalate":
wait_for_human_approval(proposal)
else:
stop(reason=decision.reason)
Supervisor does not replace Worker Agent. It adds a check between planning and execution.
How it looks during execution
Goal: process customer refund
Worker proposal:
- refund 5000 USD
- send confirmation email
Supervisor:
- policy check: auto-refund allowed only up to 1000 USD
- decision: escalate, human confirmation required
Human approval (approve with changes):
- approved_refund_amount: 800 USD
- comment: "Approve refund only within 800 USD"
Execution:
- refund 800 USD (amount from human decision)
- send confirmation email
Status: done
Full Supervisor agent example
When it fits - and when it does not
Fits
| Situation | Why Supervisor fits | |
|---|---|---|
| β | There are risky or expensive actions | Supervisor checks a step before execution and reduces risk of expensive mistakes. |
| β | Security and compliance policy control is required | The pattern applies admission rules and blocks actions that violate policy. |
| β | Budget, access, and tool limits are important | Supervisor keeps constraints centralized and prevents bypass during execution. |
| β | An audit trail of decisions and reasons is required | Each approve, block, or escalate decision can be recorded for audit. |
Does not fit
| Situation | Why Supervisor does not fit | |
|---|---|---|
| β | Read-only task without risky steps | Additional control almost does not change risk, but complicates flow. |
| β | Critical latency where additional gate is unacceptable | Checking before each action may add unacceptable latency. |
| β | All security is strictly enforced at infrastructure level | Supervisor duplicates controls that are already enforced and adds little real value. |
Because Supervisor adds an extra verification step and may slow execution.
How It Differs From Guarded-Policy
| Guarded-Policy | Supervisor | |
|---|---|---|
| Main role | Automatically filters actions with strict rules | Evaluates the situation and decides whether it is safe to continue |
| When applied | Before each potentially risky action | At control points: before important steps or final output |
| Decision type | allow / deny / rewrite / escalate | approve / revise / block / escalate |
| Strong side | Stable, identical rules for all requests | Flexible control where context and human logic are needed |
In short: Supervisor is an oversight layer for complex decisions.
Guarded-Policy is an automatic rules-based barrier.
When To Use Supervisor (vs Other Patterns)
Use Supervisor when oversight and policy-check are needed before final result or risky action.
Quick test:
- if you need to "check policies, compliance, and risks" -> Supervisor
- if you need to "only manage step order" -> Orchestrator Agent
- if you need to "automatically block/rewrite actions before execution" -> Guarded-Policy Agent
Comparison with other patterns and examples
Quick cheat sheet:
| If the task looks like this... | Use |
|---|---|
| You need to choose one best executor | Routing Agent |
| There is a sequence of steps and order matters | Orchestrator Agent |
| You need a policy-check before result | Supervisor Agent |
| Multiple agents must reach one conclusion | Multi-Agent Collaboration |
Examples:
Routing: "Customer asks for a refund - send to Billing, not Sales".
Orchestrator: "Prepare a release: first changelog, then QA, then deploy".
Supervisor: "Before sending an email, check policies, compliance, and prohibited promises".
Multi-Agent Collaboration: "Marketing, Legal, and Product must agree on one final campaign text".
How to combine with other patterns
- Supervisor + ReAct: Supervisor checks each Act step before tool execution.
- Supervisor + Routing: not only action is controlled, but also to whom the task was routed.
- Supervisor + Orchestrator: policies and limits are applied to each parallel branch, not only to the final result.
In short
Supervisor Agent:
- checks actions before execution
- applies policies and limits
- blocks or escalates risky steps
- reduces production incident risk
Pros and Cons
Pros
controls actions of other agents
stops risky steps before execution
aligns priorities and resources
improves system controllability
Cons
adds delay for checks
clear escalation rules are required
can become a bottleneck
FAQ
Q: Does Supervisor replace infrastructure permissions (RBAC, ACL)?
A: No. This is an additional logical control. Basic technical restrictions should remain in infrastructure.
Q: What if Supervisor blocks useful actions too often?
A: Refine policies: add exceptions, risk levels, and human approval rules for gray scenarios.
Q: Can Supervisor become a single point of failure?
A: Yes, if no fallback path exists. Therefore, systems usually add timeout, safe default policy, and graceful degradation mode.
What next
Supervisor controls that individual agents stay within policy boundaries.
But what to do when multiple agents need to collaborate on one shared task?