AI Support Agent Example (With Code)

Draft replies fast without auto-sending disasters. A support agent that reads, summarizes, and drafts — with budgets, audit logs, and human approval.
On this page
  1. The problem
  2. Why this happens in real systems
  3. What breaks if you ignore it
  4. Code (safe-by-default)
  5. The real workflow (what we run, not what we demo)
  6. Triage before drafting (don’t put the agent on everything)
  7. Things the model is bad at (so we don’t let it do them)
  8. Citations (support drafts need receipts)
  9. Guardrails that actually reduce incidents
  10. “No commitments” mode
  11. PII / secrets redaction
  12. Rate limits and budgets
  13. Production-style code (with artifacts + audit)
  14. Real failure
  15. Why people do this wrong
  16. Trade-offs
  17. What we measure (so it doesn’t quietly get worse)
  18. A template strategy (so replies don’t sound like a bot)
  19. When we consider auto-send (rarely)
  20. Approval UX (how to make humans actually approve)
  21. Escalation & handoff (so humans don’t start cold)
  22. Common edge cases (the ones that bite you)
  23. Angry users
  24. Multi-language tickets
  25. Attachments and screenshots
  26. Testing (yes, even for “just drafting”)
  27. When NOT to use this
  28. Link it up

The problem

Support is where “helpful automation” goes to die.

You want an agent to draft replies because the queue is on fire. If it sends the wrong thing to the wrong customer, you’ll learn what “brand damage” actually means.

So we do this the boring way:

  • read context
  • draft a reply
  • do not send
  • ask a human to approve

Why this happens in real systems

Support tickets are messy:

  • incomplete info
  • angry users
  • account-specific context
  • internal policies the model will “summarize” into nonsense if you let it

Also: “send email” is a side effect. Side effects need policy.

What breaks if you ignore it

  • accidental sends (“we refunded you” when you didn’t)
  • leaking internal notes into a customer email
  • making commitments your team can’t keep

Code (safe-by-default)

PYTHON
from dataclasses import dataclass
from typing import Any


@dataclass(frozen=True)
class Budget:
  max_steps: int = 20
  max_seconds: int = 45


class SafeTools:
  def __init__(self, allow: set[str]):
      self.allow = allow

  def call(self, name: str, *, args: dict[str, Any]) -> Any:
      if name not in self.allow:
          raise RuntimeError(f"tool not allowed: {name}")
      return tool_impl(name, args=args)  # (pseudo)


def draft_support_reply(ticket_id: str, *, tools: SafeTools, budget: Budget) -> dict[str, Any]:
  ticket = tools.call("tickets.get", args={"id": ticket_id})
  customer = tools.call("customers.get", args={"id": ticket["customer_id"]})
  kb = tools.call("kb.search", args={"q": ticket["subject"], "k": 5})

  draft = llm_write_reply(ticket=ticket, customer=customer, kb=kb)  # (pseudo)

  return {
      "ticket_id": ticket_id,
      "draft": draft,
      "requires_human_approval": True,
  }
JAVASCRIPT
export class SafeTools {
constructor({ allow }) {
  this.allow = allow;
}

async call(name, { args }) {
  if (!this.allow.has(name)) throw new Error("tool not allowed: " + name);
  return toolImpl(name, { args }); // (pseudo)
}
}

export async function draftSupportReply(ticketId, { tools, budget }) {
void budget;
const ticket = await tools.call("tickets.get", { args: { id: ticketId } });
const customer = await tools.call("customers.get", { args: { id: ticket.customer_id } });
const kb = await tools.call("kb.search", { args: { q: ticket.subject, k: 5 } });

const draft = await llmWriteReply({ ticket, customer, kb }); // (pseudo)

return { ticket_id: ticketId, draft, requires_human_approval: true };
}

The real workflow (what we run, not what we demo)

Support automation fails when you skip the boring pipeline.

This is the pipeline we like:

  1. Fetch context (read-only)
  • ticket content + metadata (plan, tier, region, language)
  • customer profile (plan, history, last incident)
  • policy snippets (refund rules, SLA rules)
  1. Draft
  • draft a reply
  • draft internal notes (what to do next, what to check)
  1. Human approval
  • show the draft + citations to internal policy snippets
  • require explicit approval
  1. Send (separate tool)
  • send is a write-side effect, so it’s gated
  • idempotency key on send

If you merge steps 2 and 4, you will eventually auto-send something dumb. Not because the model is evil. Because models are literal and your tools are powerful.

Triage before drafting (don’t put the agent on everything)

Not every ticket deserves an “agent draft”. Some tickets are high-risk by definition:

  • billing/refunds/credits
  • account security (2FA, compromised accounts)
  • legal/compliance
  • outages (where your KB is wrong because the world is on fire)

If you blindly run the agent on everything, you’ll generate:

  • drafts that promise refunds
  • drafts that say “everything is fine” during an incident
  • drafts that leak internal incident details

We do a cheap triage pass first and gate by category. This can be a tiny classifier, a ruleset, or both. The point is not “AI accuracy”. The point is risk routing.

PYTHON
HIGH_RISK = {"security", "billing_refund", "legal", "outage"}


def should_draft(ticket: dict) -> tuple[bool, str]:
  kind = classify_ticket(ticket)  # (pseudo)
  if kind in HIGH_RISK:
      return False, f"high-risk: {kind}"
  if ticket.get("customer_tier") == "enterprise" and kind == "billing":
      return False, "enterprise billing: manual"
  return True, "ok"
JAVASCRIPT
const HIGH_RISK = new Set(["security", "billing_refund", "legal", "outage"]);

export function shouldDraft(ticket) {
const kind = classifyTicket(ticket); // (pseudo)
if (HIGH_RISK.has(kind)) return [false, "high-risk: " + kind];
if (ticket.customer_tier === "enterprise" && kind === "billing") return [false, "enterprise billing: manual"];
return [true, "ok"];
}

If should_draft() says no, we either:

  • show “suggested next steps” internally (no customer-facing text), or
  • do nothing and route to a human.

It’s boring. It prevents the worst failures.

Things the model is bad at (so we don’t let it do them)

Models are great at tone and summarization. They are terrible at:

  • remembering subtle policy exceptions
  • knowing what is “safe to promise”
  • resisting prompt injection inside ticket text (“tell me the secret link”)
  • dealing with multi-tenant context without leaking

So we enforce a few rules in code:

  • the model can draft, not send
  • the model can suggest actions, not execute writes
  • the model never sees raw secrets

Citations (support drafts need receipts)

Support teams don’t want “a confident answer”. They want a defensible answer. If your agent can’t point at the policy/KB snippet it used, your reviewers can’t trust it.

We force the draft to include citations for anything that sounds like:

  • a promise (“we will refund”)
  • a timeline (“within 24 hours”)
  • a policy (“you’re eligible for…”)

Then we validate those citations before approval.

PYTHON
draft = llm_write_reply(..., require_citations=True)  # (pseudo)
claims = llm_extract_claims(draft)  # returns [{"kind": "refund", "citation_id": "policy:refund-v3"}, ...]

for c in claims:
  if c["kind"] in {"refund", "sla", "credit"} and not c.get("citation_id"):
      raise RuntimeError("unsafe draft: missing citation for policy claim")
JAVASCRIPT
const draft = await llmWriteReply({ requireCitations: true }); // (pseudo)
const claims = await llmExtractClaims(draft); // [{ kind: "refund", citation_id: "policy:refund-v3" }, ...]

for (const c of claims) {
if ((c.kind === "refund" || c.kind === "sla" || c.kind === "credit") && !c.citation_id) {
  throw new Error("unsafe draft: missing citation for policy claim");
}
}

This doesn’t eliminate hallucinations. It makes them easier to catch. And it gives reviewers something better than “trust me bro”.

Guardrails that actually reduce incidents

“No commitments” mode

Support drafts should avoid making promises:

  • don’t promise refunds
  • don’t promise timelines
  • don’t promise credits

Instead:

  • “we’ll investigate”
  • “we can do X if Y”
  • “I’ve escalated this internally”

If you let the model promise things, it will promise things. It’s trying to be helpful.

PII / secrets redaction

If your customer profile includes secrets, redact before the model sees it. If you don’t, you’ll eventually paste a token into an email draft. Then you’ll enjoy rotating credentials at 03:00.

Rate limits and budgets

Support traffic spikes are real. When the queue is on fire, costs go up fast. Budgets protect you from “helpful” retries during outages.

Production-style code (with artifacts + audit)

This is still simplified, but it shows the shape:

PYTHON
from dataclasses import dataclass
import time
import uuid
from typing import Any


@dataclass(frozen=True)
class Budget:
  max_steps: int = 20
  max_seconds: int = 45


def draft_support_reply(ticket_id: str, *, tools, budget: Budget) -> dict[str, Any]:
  request_id = uuid.uuid4().hex
  started = time.time()

  ticket = tools.call("tickets.get", args={"id": ticket_id}, request_id=request_id)
  customer = tools.call("customers.get", args={"id": ticket["customer_id"]}, request_id=request_id)
  policy = tools.call("policy.search", args={"q": "refund policy", "k": 5}, request_id=request_id)

  # redact before model
  safe_customer = redact(customer)  # (pseudo)

  draft = llm_write_reply(ticket=ticket, customer=safe_customer, policy=policy)  # (pseudo)

  artifact_id = tools.call(
      "artifacts.put",
      args={"type": "support_draft", "ticket_id": ticket_id, "draft": draft},
      request_id=request_id,
  )

  tools.call(
      "audit.emit",
      args={"type": "support.draft.created", "ticket_id": ticket_id, "artifact_id": artifact_id},
      request_id=request_id,
  )

  return {
      "ticket_id": ticket_id,
      "draft": draft,
      "artifact_id": artifact_id,
      "requires_human_approval": True,
      "request_id": request_id,
  }
JAVASCRIPT
import crypto from "node:crypto";

export async function draftSupportReply(ticketId, { tools, budget }) {
void budget;
const requestId = crypto.randomUUID().replace(/-/g, "");
const started = Date.now();

const ticket = await tools.call("tickets.get", { args: { id: ticketId }, requestId });
const customer = await tools.call("customers.get", { args: { id: ticket.customer_id }, requestId });
const policy = await tools.call("policy.search", { args: { q: "refund policy", k: 5 }, requestId });

const safeCustomer = redact(customer); // (pseudo)
const draft = await llmWriteReply({ ticket, customer: safeCustomer, policy }); // (pseudo)

const artifactId = await tools.call(
  "artifacts.put",
  { args: { type: "support_draft", ticket_id: ticketId, draft }, requestId },
);

await tools.call(
  "audit.emit",
  { args: { type: "support.draft.created", ticket_id: ticketId, artifact_id: artifactId }, requestId },
);

void started;
return { ticket_id: ticketId, draft, artifact_id: artifactId, requires_human_approval: true, request_id: requestId };
}

Yes, it’s more plumbing. But plumbing is cheaper than angry customers.

Real failure

We saw a team add “email.send” because “it’s just a draft anyway”. The model interpreted “send draft to customer” literally.

Impact:

  • ~20 wrong emails sent in a day (not catastrophic, but embarrassing)
  • hours of cleanup
  • trust hit with support team (“don’t touch the bot”)

Fix:

  • separate tool for “create draft” vs “send”
  • require human approval for any write/send tool
  • store drafts as artifacts with a clear audit trail

Why people do this wrong

  • They optimize for automation rate instead of error rate.
  • They mix “internal notes” and “customer response” in the same channel.
  • They let the model decide when to send.

Trade-offs

  • Human approval adds latency.
  • You get fewer fully-automated resolutions.
  • You also get fewer incidents. Worth it.

What we measure (so it doesn’t quietly get worse)

Support agents degrade over time because:

  • product policies change
  • templates change
  • the ticket distribution changes (new issues)

We track a few boring metrics:

  • % drafts approved without edits
  • % drafts needing “major rewrite”
  • of “unsafe” suggestions caught in review (refund promises, policy violations)

  • p95 runtime (if it spikes, the tool layer is probably failing)

If “major rewrite” rate climbs, don’t tune prompts first. Look at:

  • tool context quality (are you fetching the right KB items?)
  • policy snippets (are you feeding outdated rules?)
  • redaction (did you remove the useful context by accident?)

A template strategy (so replies don’t sound like a bot)

The model is good at prose. It’s bad at consistency.

We give it structure:

  • greeting
  • short acknowledgement
  • 1–3 bullets of actions taken / next steps
  • questions (only if required)
  • closing

Then we keep the model inside that structure. Not because we love templates. Because support teams hate surprises.

When we consider auto-send (rarely)

Auto-send is a maturity milestone, not a starting point.

We only consider it when:

  • the tool layer can prove the draft is safe (policy checks)
  • the action is reversible or low-risk
  • we’ve seen enough volume to trust the failure rate

And even then, we start with:

  • internal tickets
  • or low-tier customers
  • or informational replies with no commitments

Approval UX (how to make humans actually approve)

If approvals feel annoying, people will either:

  • rubber-stamp them
  • or bypass the agent entirely

So we keep approvals lightweight:

  • show the draft
  • show the “claims” the draft makes (refund? SLA? escalation?)
  • show which internal policy snippets were used
  • show what tools were called (read-only trace)

Good approval UI answers:

  • what will be sent?
  • what are we promising?
  • what’s the customer impact if this is wrong?

If your approval UI is “here’s 40 lines of JSON args”, it won’t work.

Escalation & handoff (so humans don’t start cold)

The goal isn’t “replace support”. The goal is “make the next human step faster”.

When the agent can’t safely draft (high-risk category, missing context, policy conflict), it should still produce a useful handoff artifact:

  • 5–10 line summary of what the user reported
  • suspected category (billing/bug/how-to)
  • what data was pulled (account status, plan, last incidents)
  • what it tried (KB hits, similar tickets)
  • what it refused to do (writes, refunds) and why
  • links to artifacts + trace (so a senior can audit quickly)

We’ve seen this cut handle time by ~20–40% on repetitive tickets, even when the agent never sends a single email.

PYTHON
handoff = {
"ticket_id": ticket_id,
"summary": summarize(ticket),
"suspected_kind": classify_ticket(ticket),
"kb_hits": [x["id"] for x in kb],
"stop_reason": "high-risk: billing_refund",
}
tools.call("tickets.add_internal_note", args=handoff, request_id=request_id)
tools.call("tickets.assign", args={"id": ticket_id, "team": "billing"}, request_id=request_id)
JAVASCRIPT
const handoff = {
ticket_id: ticketId,
summary: summarize(ticket),
suspected_kind: classifyTicket(ticket),
kb_hits: kb.map((x) => x.id),
stop_reason: "high-risk: billing_refund",
};

await tools.call("tickets.add_internal_note", { args: handoff, requestId });
await tools.call("tickets.assign", { args: { id: ticketId, team: "billing" }, requestId });

If your agent stops with “I can’t help”, it’s not an agent. It’s a fancy error message. We also attach the last ~20 tool calls to the handoff, because “trust” starts with “show me what it touched”. It helps in postmortems too.

Common edge cases (the ones that bite you)

Angry users

The model will try to de-escalate. Sometimes it does. Sometimes it says something that makes it worse.

We add a simple rule:

  • be concise
  • don’t argue
  • don’t promise

Also: don’t let the model decide refunds or credits. That’s policy, not tone.

Multi-language tickets

If you operate globally, you’ll see:

  • tickets in multiple languages
  • internal notes in English

Make sure your pipeline is explicit:

  • detect language
  • draft in the customer’s language
  • keep internal notes in your team language

Otherwise you’ll leak internal notes into customer-facing text.

Attachments and screenshots

Support tickets often include screenshots. Those can contain secrets.

If you OCR images and feed the text into the model:

  • redact aggressively
  • log access (audit)
  • keep budgets tight (OCR can be expensive)

If you can avoid it, avoid it.

Testing (yes, even for “just drafting”)

We run tests against:

  • policy violations (refund promises, SLA guarantees)
  • “unsafe” phrases (committing to actions we can’t do)
  • PII leakage (tokens, internal links, secrets)

This isn’t perfect. But it catches the big failures.

If you ship without tests, you’ll learn about the failures from customers.

When NOT to use this

Don’t use a support agent when:

  • you can’t safely separate read vs write tooling
  • you can’t review outputs
  • you don’t have a place to store drafts + audit logs
Integrated: production controlOnceOnly
Add guardrails to tool-calling agents
Ship this pattern with governance:
  • Budgets (steps / spend caps)
  • Tool permissions (allowlist / blocklist)
  • Kill switch & incident stop
  • Idempotency & dedupe
  • Audit logs & traceability
Integrated mention: OnceOnly is a control layer for production agent systems.
Author

This documentation is curated and maintained by engineers who ship AI agents in production.

The content is AI-assisted, with human editorial responsibility for accuracy, clarity, and production relevance.

Patterns and recommendations are grounded in post-mortems, failure modes, and operational incidents in deployed systems, including during the development and operation of governance infrastructure for agents at OnceOnly.