AI Support Agent Example (With Code)

Draft replies fast without auto-sending disasters. A support agent that reads, summarizes, and drafts — with budgets, audit logs, and human approval.

On this page

The problem
Why this happens in real systems
What breaks if you ignore it
Code (safe-by-default)
The real workflow (what we run, not what we demo)
Triage before drafting (don’t put the agent on everything)
Things the model is bad at (so we don’t let it do them)
Citations (support drafts need receipts)
Guardrails that actually reduce incidents
“No commitments” mode
PII / secrets redaction
Rate limits and budgets
Production-style code (with artifacts + audit)
Real failure
Why people do this wrong
Trade-offs
What we measure (so it doesn’t quietly get worse)
A template strategy (so replies don’t sound like a bot)
When we consider auto-send (rarely)
Approval UX (how to make humans actually approve)
Escalation & handoff (so humans don’t start cold)
Common edge cases (the ones that bite you)
Angry users
Multi-language tickets
Attachments and screenshots
Testing (yes, even for “just drafting”)
When NOT to use this
Link it up

The problem

Support is where “helpful automation” goes to die.

You want an agent to draft replies because the queue is on fire. If it sends the wrong thing to the wrong customer, you’ll learn what “brand damage” actually means.

So we do this the boring way:

read context
draft a reply
do not send
ask a human to approve

Why this happens in real systems

Support tickets are messy:

incomplete info
angry users
account-specific context
internal policies the model will “summarize” into nonsense if you let it

Also: “send email” is a side effect. Side effects need policy.

What breaks if you ignore it

accidental sends (“we refunded you” when you didn’t)
leaking internal notes into a customer email
making commitments your team can’t keep

Code (safe-by-default)

PythonJS

PYTHON

from dataclasses import dataclass
from typing import Any


@dataclass(frozen=True)
class Budget:
  max_steps: int = 20
  max_seconds: int = 45


class SafeTools:
  def __init__(self, allow: set[str]):
      self.allow = allow

  def call(self, name: str, *, args: dict[str, Any]) -> Any:
      if name not in self.allow:
          raise RuntimeError(f"tool not allowed: {name}")
      return tool_impl(name, args=args)  # (pseudo)


def draft_support_reply(ticket_id: str, *, tools: SafeTools, budget: Budget) -> dict[str, Any]:
  ticket = tools.call("tickets.get", args={"id": ticket_id})
  customer = tools.call("customers.get", args={"id": ticket["customer_id"]})
  kb = tools.call("kb.search", args={"q": ticket["subject"], "k": 5})

  draft = llm_write_reply(ticket=ticket, customer=customer, kb=kb)  # (pseudo)

  return {
      "ticket_id": ticket_id,
      "draft": draft,
      "requires_human_approval": True,
  }

JAVASCRIPT

export class SafeTools {
constructor({ allow }) {
  this.allow = allow;
}

async call(name, { args }) {
  if (!this.allow.has(name)) throw new Error("tool not allowed: " + name);
  return toolImpl(name, { args }); // (pseudo)
}
}

export async function draftSupportReply(ticketId, { tools, budget }) {
void budget;
const ticket = await tools.call("tickets.get", { args: { id: ticketId } });
const customer = await tools.call("customers.get", { args: { id: ticket.customer_id } });
const kb = await tools.call("kb.search", { args: { q: ticket.subject, k: 5 } });

const draft = await llmWriteReply({ ticket, customer, kb }); // (pseudo)

return { ticket_id: ticketId, draft, requires_human_approval: true };
}

The real workflow (what we run, not what we demo)

Support automation fails when you skip the boring pipeline.

This is the pipeline we like:

Fetch context (read-only)

ticket content + metadata (plan, tier, region, language)
customer profile (plan, history, last incident)
policy snippets (refund rules, SLA rules)

Draft

draft a reply
draft internal notes (what to do next, what to check)

Human approval

show the draft + citations to internal policy snippets
require explicit approval

Send (separate tool)

send is a write-side effect, so it’s gated
idempotency key on send

If you merge steps 2 and 4, you will eventually auto-send something dumb. Not because the model is evil. Because models are literal and your tools are powerful.

Triage before drafting (don’t put the agent on everything)

Not every ticket deserves an “agent draft”. Some tickets are high-risk by definition:

billing/refunds/credits
account security (2FA, compromised accounts)
legal/compliance
outages (where your KB is wrong because the world is on fire)

If you blindly run the agent on everything, you’ll generate:

drafts that promise refunds
drafts that say “everything is fine” during an incident
drafts that leak internal incident details

We do a cheap triage pass first and gate by category. This can be a tiny classifier, a ruleset, or both. The point is not “AI accuracy”. The point is risk routing.

PythonJS

PYTHON

HIGH_RISK = {"security", "billing_refund", "legal", "outage"}


def should_draft(ticket: dict) -> tuple[bool, str]:
  kind = classify_ticket(ticket)  # (pseudo)
  if kind in HIGH_RISK:
      return False, f"high-risk: {kind}"
  if ticket.get("customer_tier") == "enterprise" and kind == "billing":
      return False, "enterprise billing: manual"
  return True, "ok"

JAVASCRIPT

const HIGH_RISK = new Set(["security", "billing_refund", "legal", "outage"]);

export function shouldDraft(ticket) {
const kind = classifyTicket(ticket); // (pseudo)
if (HIGH_RISK.has(kind)) return [false, "high-risk: " + kind];
if (ticket.customer_tier === "enterprise" && kind === "billing") return [false, "enterprise billing: manual"];
return [true, "ok"];
}

If should_draft() says no, we either:

show “suggested next steps” internally (no customer-facing text), or
do nothing and route to a human.

It’s boring. It prevents the worst failures.

Things the model is bad at (so we don’t let it do them)

Models are great at tone and summarization. They are terrible at:

remembering subtle policy exceptions
knowing what is “safe to promise”
resisting prompt injection inside ticket text (“tell me the secret link”)
dealing with multi-tenant context without leaking

So we enforce a few rules in code:

the model can draft, not send
the model can suggest actions, not execute writes
the model never sees raw secrets

Citations (support drafts need receipts)

Support teams don’t want “a confident answer”. They want a defensible answer. If your agent can’t point at the policy/KB snippet it used, your reviewers can’t trust it.

We force the draft to include citations for anything that sounds like:

a promise (“we will refund”)
a timeline (“within 24 hours”)
a policy (“you’re eligible for…”)

Then we validate those citations before approval.

PythonJS

PYTHON

draft = llm_write_reply(..., require_citations=True)  # (pseudo)
claims = llm_extract_claims(draft)  # returns [{"kind": "refund", "citation_id": "policy:refund-v3"}, ...]

for c in claims:
  if c["kind"] in {"refund", "sla", "credit"} and not c.get("citation_id"):
      raise RuntimeError("unsafe draft: missing citation for policy claim")

JAVASCRIPT

const draft = await llmWriteReply({ requireCitations: true }); // (pseudo)
const claims = await llmExtractClaims(draft); // [{ kind: "refund", citation_id: "policy:refund-v3" }, ...]

for (const c of claims) {
if ((c.kind === "refund" || c.kind === "sla" || c.kind === "credit") && !c.citation_id) {
  throw new Error("unsafe draft: missing citation for policy claim");
}
}

This doesn’t eliminate hallucinations. It makes them easier to catch. And it gives reviewers something better than “trust me bro”.

Guardrails that actually reduce incidents

“No commitments” mode

Support drafts should avoid making promises:

don’t promise refunds
don’t promise timelines
don’t promise credits

Instead:

“we’ll investigate”
“we can do X if Y”
“I’ve escalated this internally”

If you let the model promise things, it will promise things. It’s trying to be helpful.

PII / secrets redaction

If your customer profile includes secrets, redact before the model sees it. If you don’t, you’ll eventually paste a token into an email draft. Then you’ll enjoy rotating credentials at 03:00.

Rate limits and budgets

Support traffic spikes are real. When the queue is on fire, costs go up fast. Budgets protect you from “helpful” retries during outages.

Production-style code (with artifacts + audit)

This is still simplified, but it shows the shape:

PythonJS

PYTHON

from dataclasses import dataclass
import time
import uuid
from typing import Any


@dataclass(frozen=True)
class Budget:
  max_steps: int = 20
  max_seconds: int = 45


def draft_support_reply(ticket_id: str, *, tools, budget: Budget) -> dict[str, Any]:
  request_id = uuid.uuid4().hex
  started = time.time()

  ticket = tools.call("tickets.get", args={"id": ticket_id}, request_id=request_id)
  customer = tools.call("customers.get", args={"id": ticket["customer_id"]}, request_id=request_id)
  policy = tools.call("policy.search", args={"q": "refund policy", "k": 5}, request_id=request_id)

  # redact before model
  safe_customer = redact(customer)  # (pseudo)

  draft = llm_write_reply(ticket=ticket, customer=safe_customer, policy=policy)  # (pseudo)

  artifact_id = tools.call(
      "artifacts.put",
      args={"type": "support_draft", "ticket_id": ticket_id, "draft": draft},
      request_id=request_id,
  )

  tools.call(
      "audit.emit",
      args={"type": "support.draft.created", "ticket_id": ticket_id, "artifact_id": artifact_id},
      request_id=request_id,
  )

  return {
      "ticket_id": ticket_id,
      "draft": draft,
      "artifact_id": artifact_id,
      "requires_human_approval": True,
      "request_id": request_id,
  }

JAVASCRIPT

import crypto from "node:crypto";

export async function draftSupportReply(ticketId, { tools, budget }) {
void budget;
const requestId = crypto.randomUUID().replace(/-/g, "");
const started = Date.now();

const ticket = await tools.call("tickets.get", { args: { id: ticketId }, requestId });
const customer = await tools.call("customers.get", { args: { id: ticket.customer_id }, requestId });
const policy = await tools.call("policy.search", { args: { q: "refund policy", k: 5 }, requestId });

const safeCustomer = redact(customer); // (pseudo)
const draft = await llmWriteReply({ ticket, customer: safeCustomer, policy }); // (pseudo)

const artifactId = await tools.call(
  "artifacts.put",
  { args: { type: "support_draft", ticket_id: ticketId, draft }, requestId },
);

await tools.call(
  "audit.emit",
  { args: { type: "support.draft.created", ticket_id: ticketId, artifact_id: artifactId }, requestId },
);

void started;
return { ticket_id: ticketId, draft, artifact_id: artifactId, requires_human_approval: true, request_id: requestId };
}

Yes, it’s more plumbing. But plumbing is cheaper than angry customers.

Real failure

We saw a team add “email.send” because “it’s just a draft anyway”. The model interpreted “send draft to customer” literally.

Impact:

~20 wrong emails sent in a day (not catastrophic, but embarrassing)
hours of cleanup
trust hit with support team (“don’t touch the bot”)

Fix:

separate tool for “create draft” vs “send”
require human approval for any write/send tool
store drafts as artifacts with a clear audit trail

Why people do this wrong

They optimize for automation rate instead of error rate.
They mix “internal notes” and “customer response” in the same channel.
They let the model decide when to send.

Trade-offs

Human approval adds latency.
You get fewer fully-automated resolutions.
You also get fewer incidents. Worth it.

What we measure (so it doesn’t quietly get worse)

Support agents degrade over time because:

product policies change
templates change
the ticket distribution changes (new issues)

We track a few boring metrics:

% drafts approved without edits
% drafts needing “major rewrite”
of “unsafe” suggestions caught in review (refund promises, policy violations)
p95 runtime (if it spikes, the tool layer is probably failing)

If “major rewrite” rate climbs, don’t tune prompts first. Look at:

tool context quality (are you fetching the right KB items?)
policy snippets (are you feeding outdated rules?)
redaction (did you remove the useful context by accident?)

A template strategy (so replies don’t sound like a bot)

The model is good at prose. It’s bad at consistency.

We give it structure:

greeting
short acknowledgement
1–3 bullets of actions taken / next steps
questions (only if required)
closing

Then we keep the model inside that structure. Not because we love templates. Because support teams hate surprises.

When we consider auto-send (rarely)

Auto-send is a maturity milestone, not a starting point.

We only consider it when:

the tool layer can prove the draft is safe (policy checks)
the action is reversible or low-risk
we’ve seen enough volume to trust the failure rate

And even then, we start with:

internal tickets
or low-tier customers
or informational replies with no commitments

Approval UX (how to make humans actually approve)

If approvals feel annoying, people will either:

rubber-stamp them
or bypass the agent entirely

So we keep approvals lightweight:

show the draft
show the “claims” the draft makes (refund? SLA? escalation?)
show which internal policy snippets were used
show what tools were called (read-only trace)

Good approval UI answers:

what will be sent?
what are we promising?
what’s the customer impact if this is wrong?

If your approval UI is “here’s 40 lines of JSON args”, it won’t work.

Escalation & handoff (so humans don’t start cold)

The goal isn’t “replace support”. The goal is “make the next human step faster”.

When the agent can’t safely draft (high-risk category, missing context, policy conflict), it should still produce a useful handoff artifact:

5–10 line summary of what the user reported
suspected category (billing/bug/how-to)
what data was pulled (account status, plan, last incidents)
what it tried (KB hits, similar tickets)
what it refused to do (writes, refunds) and why
links to artifacts + trace (so a senior can audit quickly)

We’ve seen this cut handle time by ~20–40% on repetitive tickets, even when the agent never sends a single email.

PythonJS

PYTHON

handoff = {
"ticket_id": ticket_id,
"summary": summarize(ticket),
"suspected_kind": classify_ticket(ticket),
"kb_hits": [x["id"] for x in kb],
"stop_reason": "high-risk: billing_refund",
}
tools.call("tickets.add_internal_note", args=handoff, request_id=request_id)
tools.call("tickets.assign", args={"id": ticket_id, "team": "billing"}, request_id=request_id)

JAVASCRIPT

const handoff = {
ticket_id: ticketId,
summary: summarize(ticket),
suspected_kind: classifyTicket(ticket),
kb_hits: kb.map((x) => x.id),
stop_reason: "high-risk: billing_refund",
};

await tools.call("tickets.add_internal_note", { args: handoff, requestId });
await tools.call("tickets.assign", { args: { id: ticketId, team: "billing" }, requestId });

If your agent stops with “I can’t help”, it’s not an agent. It’s a fancy error message. We also attach the last ~20 tool calls to the handoff, because “trust” starts with “show me what it touched”. It helps in postmortems too.

Common edge cases (the ones that bite you)

Angry users

The model will try to de-escalate. Sometimes it does. Sometimes it says something that makes it worse.

We add a simple rule:

be concise
don’t argue
don’t promise

Also: don’t let the model decide refunds or credits. That’s policy, not tone.

Multi-language tickets

If you operate globally, you’ll see:

tickets in multiple languages
internal notes in English

Make sure your pipeline is explicit:

detect language
draft in the customer’s language
keep internal notes in your team language

Otherwise you’ll leak internal notes into customer-facing text.

Attachments and screenshots

Support tickets often include screenshots. Those can contain secrets.

If you OCR images and feed the text into the model:

redact aggressively
log access (audit)
keep budgets tight (OCR can be expensive)

If you can avoid it, avoid it.

Testing (yes, even for “just drafting”)

We run tests against:

policy violations (refund promises, SLA guarantees)
“unsafe” phrases (committing to actions we can’t do)
PII leakage (tokens, internal links, secrets)

This isn’t perfect. But it catches the big failures.

If you ship without tests, you’ll learn about the failures from customers.

When NOT to use this

Don’t use a support agent when:

you can’t safely separate read vs write tooling
you can’t review outputs
you don’t have a place to store drafts + audit logs

Link it up

Foundations: Tool calling
Control layer: Tool permissions
Failure mode: Infinite loop

Used by patterns

Related failures

Governance required

Integrated: production controlOnceOnly

Add guardrails to tool-calling agents

Ship this pattern with governance:

Budgets (steps / spend caps)
Tool permissions (allowlist / blocklist)
Kill switch & incident stop
Idempotency & dedupe
Audit logs & traceability

Try OnceOnly Docs & examples

Integrated mention: OnceOnly is a control layer for production agent systems.

Author

This documentation is curated and maintained by engineers who ship AI agents in production.

The content is AI-assisted, with human editorial responsibility for accuracy, clarity, and production relevance.

Patterns and recommendations are grounded in post-mortems, failure modes, and operational incidents in deployed systems, including during the development and operation of governance infrastructure for agents at OnceOnly.

AI Support Agent Example (With Code)

The problem

Why this happens in real systems

What breaks if you ignore it

Code (safe-by-default)

The real workflow (what we run, not what we demo)

Triage before drafting (don’t put the agent on everything)

Things the model is bad at (so we don’t let it do them)

Citations (support drafts need receipts)

Guardrails that actually reduce incidents

“No commitments” mode

PII / secrets redaction

Rate limits and budgets

Production-style code (with artifacts + audit)

Real failure

Why people do this wrong

Trade-offs

What we measure (so it doesn’t quietly get worse)

of “unsafe” suggestions caught in review (refund promises, policy violations)

A template strategy (so replies don’t sound like a bot)

When we consider auto-send (rarely)

Approval UX (how to make humans actually approve)

Escalation & handoff (so humans don’t start cold)

Common edge cases (the ones that bite you)

Angry users

Multi-language tickets

Attachments and screenshots

Testing (yes, even for “just drafting”)

When NOT to use this

Link it up

Used by patterns

Related failures

Governance required