Hallucinated Sources in AI Agents (Failure Mode + Fixes + Code)

  • Spot the failure early before the bill climbs.
  • Learn what breaks in production and why.
  • Copy guardrails: budgets, stop reasons, validation.
  • Know when this isn’t the real root cause.
Detection signals
  • Tool calls per run spikes (or repeats with same args hash).
  • Spend or tokens per request climbs without better outputs.
  • Retries shift from rare to constant (429/5xx).
Agents will confidently cite URLs they never fetched. Here’s why it happens in production and how to enforce evidence-backed citations.
On this page
  1. Quick take
  2. Problem-first intro
  3. Why this fails in production
  4. 1) The model is optimized to look helpful, not to be auditable
  5. 2) “Search results” are not “evidence”
  6. 3) Evidence gets lost between steps
  7. 4) “Cite sources” is a policy. Policies don’t enforce themselves.
  8. Implementation example (real code)
  9. Example incident (numbers are illustrative)
  10. Trade-offs
  11. When NOT to use
  12. Copy-paste checklist
  13. Safe default config snippet (JSON/YAML)
  14. FAQ (3–5)
  15. Related pages (3–6 links)
Interactive flow
Scenario:
Step 1/2: Execution

Normal path: execute → tool → observe.

Quick take

  • “Cite sources” is not enforceable unless citations are verified against captured tool evidence.
  • Make citations refer to source_ids (snapshots), not raw URLs.
  • Treat “search results” as discovery, not evidence (fetch before citing).
  • Fail closed (or degrade) when citations don’t verify.

Problem-first intro

Your agent produces a “well sourced” answer.

Then someone clicks the sources.

One link 404s. Another is unrelated. A third is a PDF the agent clearly didn’t read (it’s 120 pages; the answer came back in 6 seconds).

Congrats — you’ve shipped a credibility bug.

In production this isn’t just embarrassing. It’s expensive:

  • Support and trust take a hit (“your docs are fake”).
  • Legal/compliance gets involved if you’re citing policies or regulations.
  • Your team burns hours doing “citation archaeology” in logs that don’t exist.

This failure mode shows up the moment you ask the model for “sources” without giving it a hard constraint on what counts as a source.

Why this fails in production

Hallucinated citations aren’t magical. They’re a predictable result of how we build agents.

1) The model is optimized to look helpful, not to be auditable

If the prompt says “include sources”, the model will include sources. Even if it has none. It’ll invent something plausible:

  • a domain that sounds right
  • a URL path that looks real
  • a document title that “should exist”

The model isn’t lying “on purpose”. It’s satisfying the shape of the output you asked for.

2) “Search results” are not “evidence”

Many agents do this:

  1. call search.read("x")
  2. get a list of titles + URLs
  3. answer with citations

But the agent didn’t fetch the pages. It doesn’t know the content. It only knows what the search snippet claims the page contains.

If you accept that as evidence, you’ll cite things you never read. Because you didn’t.

3) Evidence gets lost between steps

Even if you fetch pages, evidence often gets dropped:

  • tool output isn’t stored, only summarized
  • context gets truncated
  • a retry reorders results
  • a later step overwrites earlier sources

If you can’t trace “this sentence came from this document snapshot”, you don’t have citations. You have decoration.

4) “Cite sources” is a policy. Policies don’t enforce themselves.

You can’t prompt your way into auditability. You need enforcement in code:

  • sources must come from tool outputs your system captured
  • citations must reference those captured sources
  • outputs without valid citations must fail (or degrade)

Here’s the pipeline you actually want:

Diagram
Evidence-backed citations (fail closed)

Implementation example (real code)

The safest pattern we’ve found:

  • treat “sources” as IDs, not URLs
  • only allow citations that refer to snapshotted tool outputs
  • optionally: require a short quote/excerpt hash per citation
PYTHON
from __future__ import annotations

from dataclasses import dataclass
import hashlib
import time
from typing import Any


@dataclass(frozen=True)
class Evidence:
  source_id: str
  url: str
  fetched_at: float
  title: str
  text_sha256: str


class EvidenceStore:
  def __init__(self) -> None:
      self._items: dict[str, Evidence] = {}

  def add(self, *, url: str, title: str, text: str) -> str:
      sha = hashlib.sha256(text.encode("utf-8")).hexdigest()
      source_id = f"src_{len(self._items)+1:03d}"
      self._items[source_id] = Evidence(
          source_id=source_id,
          url=url,
          fetched_at=time.time(),
          title=title,
          text_sha256=sha,
      )
      return source_id

  def has(self, source_id: str) -> bool:
      return source_id in self._items

  def meta(self, source_id: str) -> Evidence:
      return self._items[source_id]


def verify_citations(*, cited_source_ids: list[str], store: EvidenceStore) -> None:
  missing = [s for s in cited_source_ids if not store.has(s)]
  if missing:
      raise ValueError(f"invalid citations (unknown source_ids): {missing}")


def answer_with_citations(task: str, *, store: EvidenceStore) -> dict[str, Any]:
  # In real code: the model returns structured output.
  # Example shape:
  # { "answer": "...", "citations": ["src_001", "src_002"] }
  out = llm_answer(task)  # (pseudo)
  verify_citations(cited_source_ids=out["citations"], store=store)
  return out


def render_sources(cited_ids: list[str], store: EvidenceStore) -> list[dict[str, str]]:
  sources: list[dict[str, str]] = []
  for sid in cited_ids:
      ev = store.meta(sid)
      sources.append(
          {
              "source_id": sid,
              "title": ev.title,
              "url": ev.url,
              "sha256": ev.text_sha256[:12],
          }
      )
  return sources
JAVASCRIPT
import crypto from "node:crypto";

export class EvidenceStore {
constructor() {
  this.items = new Map();
}

add({ url, title, text }) {
  const sha = crypto.createHash("sha256").update(text, "utf8").digest("hex");
  const sourceId = "src_" + String(this.items.size + 1).padStart(3, "0");
  this.items.set(sourceId, { sourceId, url, title, fetchedAt: Date.now(), textSha256: sha });
  return sourceId;
}

has(sourceId) {
  return this.items.has(sourceId);
}

meta(sourceId) {
  const ev = this.items.get(sourceId);
  if (!ev) throw new Error("unknown source_id: " + sourceId);
  return ev;
}
}

export function verifyCitations({ citedSourceIds, store }) {
const missing = citedSourceIds.filter((s) => !store.has(s));
if (missing.length) throw new Error("invalid citations (unknown source_ids): " + missing.join(", "));
}

export function answerWithCitations(task, { store }) {
// Real code: the model returns structured output validated by schema.
// Example shape:
// { answer: "...", citations: ["src_001", "src_002"] }
const out = llmAnswer(task); // (pseudo)
verifyCitations({ citedSourceIds: out.citations || [], store });
return out;
}

export function renderSources(citedIds, store) {
return citedIds.map((sid) => {
  const ev = store.meta(sid);
  return { source_id: sid, title: ev.title, url: ev.url, sha256: ev.textSha256.slice(0, 12) };
});
}

What this buys you:

  • citations can’t point to imaginary URLs
  • you can reproduce answers later (“here’s the snapshot hash”)
  • you can fail closed when citations don’t verify

If you want to go further, require an excerpt hash (or exact quote) per claim. It’s slower. It’s also harder to fake.

Example incident (numbers are illustrative)

Example: an “internal research agent” generating weekly competitive summaries. It was asked to “include sources”.

What actually happened:

  • it cited a handful of credible-looking URLs
  • those URLs were not fetched by the agent
  • two of the links were dead
  • one was a completely unrelated press release

Impact:

  • a PM forwarded the doc to a partner (yikes)
  • we spent ~6 engineer-hours reconstructing which tool calls happened
  • we lost trust for a month (“cool demo, but I can’t use it”)

Fix:

  1. sources became source_ids tied to tool snapshots
  2. “search results” stopped counting as evidence
  3. answers without verified citations degraded to: “I can’t cite this reliably”

Dry lesson: if you don’t store evidence, you don’t have citations.

Trade-offs

  • Evidence snapshots cost storage and time.
  • Fail-closed citation verification reduces “answer rate” early on.
  • For some tasks, citations are unnecessary overhead (don’t force it everywhere).

When NOT to use

  • If the output is internal-only and doesn’t need citations, don’t add them “for vibes”.
  • If your system can’t fetch/store evidence safely (PII, secrets), don’t pretend citations are reliable.
  • If you’re doing deterministic lookups from a single source of truth, just link the source directly.

Copy-paste checklist

  • [ ] Treat citations as source_ids, not URLs
  • [ ] Store tool output snapshots (URL + hash + timestamp)
  • [ ] Disallow citations to unfetched URLs
  • [ ] Separate “search results” from “evidence”
  • [ ] Validate citations (fail closed or degrade)
  • [ ] Log run_id + source_ids + snapshot hashes
  • [ ] Add a retention policy for snapshots
  • [ ] Add a safe-mode: “answer without sources” if evidence isn’t available

Safe default config snippet (JSON/YAML)

YAML
citations:
  required: true
  evidence_sources: ["http.get", "kb.read"]
  allow_search_results_as_evidence: false
  fail_closed: true
  attach_snapshot_hash: true
  retention_days: 14

FAQ (3–5)

Can’t I just tell the model to cite sources?
You can, but it’s not enforceable. Without a verifier tied to tool snapshots, citations are decoration.
Do I need to store full page text?
Not always. Start with URL + title + content hash + timestamp. Store full text if you need quotes or replay.
Are search results ever acceptable evidence?
Only if you’re comfortable citing things you didn’t read. In production: usually no.
What about private docs?
Same pattern. Use `source_id`s tied to `kb.read` snapshots. Don’t leak raw text into logs.

Not sure this is your use case?

Design your agent ->
⏱️ 7 min readUpdated Mar, 2026Difficulty: ★★☆
Implement in OnceOnly
Guardrails for loops, retries, and spend escalation.
Use in OnceOnly
# onceonly guardrails (concept)
version: 1
budgets:
  max_steps: 25
  max_tool_calls: 12
  max_seconds: 60
  max_usd: 1.00
policy:
  tool_allowlist:
    - search.read
    - http.get
controls:
  loop_detection:
    enabled: true
    dedupe_by: [tool, args_hash]
  retries:
    max: 2
    backoff_ms: [200, 800]
stop_reasons:
  enabled: true
logging:
  tool_calls: { enabled: true, store_args: false, store_args_hash: true }
Integrated: production controlOnceOnly
Add guardrails to tool-calling agents
Ship this pattern with governance:
  • Budgets (steps / spend caps)
  • Kill switch & incident stop
  • Audit logs & traceability
  • Idempotency & dedupe
  • Tool permissions (allowlist / blocklist)
Integrated mention: OnceOnly is a control layer for production agent systems.
Example policy (concept)
# Example (Python — conceptual)
policy = {
  "budgets": {"steps": 20, "seconds": 60, "usd": 1.0},
  "controls": {"kill_switch": True, "audit": True},
}
Author

This documentation is curated and maintained by engineers who ship AI agents in production.

The content is AI-assisted, with human editorial responsibility for accuracy, clarity, and production relevance.

Patterns and recommendations are grounded in post-mortems, failure modes, and operational incidents in deployed systems, including during the development and operation of governance infrastructure for agents at OnceOnly.