Hallucinated Sources in AI Agents (Failure Mode + Fixes + Code)

Spot the failure early before the bill climbs.
Learn what breaks in production and why.
Copy guardrails: budgets, stop reasons, validation.
Know when this isn’t the real root cause.

Detection signals

Tool calls per run spikes (or repeats with same args hash).
Spend or tokens per request climbs without better outputs.
Retries shift from rare to constant (429/5xx).

Agents will confidently cite URLs they never fetched. Here’s why it happens in production and how to enforce evidence-backed citations.

On this page

Quick take
Problem-first intro
Why this fails in production
1) The model is optimized to look helpful, not to be auditable
2) “Search results” are not “evidence”
3) Evidence gets lost between steps
4) “Cite sources” is a policy. Policies don’t enforce themselves.
Implementation example (real code)
Example incident (numbers are illustrative)
Trade-offs
When NOT to use
Copy-paste checklist
Safe default config snippet (JSON/YAML)
FAQ (3–5)
Related pages (3–6 links)

Interactive flow

Scenario:

Step 1/2: Execution

Normal path: execute → tool → observe.

Quick take

“Cite sources” is not enforceable unless citations are verified against captured tool evidence.
Make citations refer to source_ids (snapshots), not raw URLs.
Treat “search results” as discovery, not evidence (fetch before citing).
Fail closed (or degrade) when citations don’t verify.

Problem-first intro

Your agent produces a “well sourced” answer.

Then someone clicks the sources.

One link 404s. Another is unrelated. A third is a PDF the agent clearly didn’t read (it’s 120 pages; the answer came back in 6 seconds).

Congrats — you’ve shipped a credibility bug.

In production this isn’t just embarrassing. It’s expensive:

Support and trust take a hit (“your docs are fake”).
Legal/compliance gets involved if you’re citing policies or regulations.
Your team burns hours doing “citation archaeology” in logs that don’t exist.

This failure mode shows up the moment you ask the model for “sources” without giving it a hard constraint on what counts as a source.

Why this fails in production

Hallucinated citations aren’t magical. They’re a predictable result of how we build agents.

1) The model is optimized to look helpful, not to be auditable

If the prompt says “include sources”, the model will include sources. Even if it has none. It’ll invent something plausible:

a domain that sounds right
a URL path that looks real
a document title that “should exist”

The model isn’t lying “on purpose”. It’s satisfying the shape of the output you asked for.

2) “Search results” are not “evidence”

Many agents do this:

call search.read("x")
get a list of titles + URLs
answer with citations

But the agent didn’t fetch the pages. It doesn’t know the content. It only knows what the search snippet claims the page contains.

If you accept that as evidence, you’ll cite things you never read. Because you didn’t.

3) Evidence gets lost between steps

Even if you fetch pages, evidence often gets dropped:

tool output isn’t stored, only summarized
context gets truncated
a retry reorders results
a later step overwrites earlier sources

If you can’t trace “this sentence came from this document snapshot”, you don’t have citations. You have decoration.

4) “Cite sources” is a policy. Policies don’t enforce themselves.

You can’t prompt your way into auditability. You need enforcement in code:

sources must come from tool outputs your system captured
citations must reference those captured sources
outputs without valid citations must fail (or degrade)

Here’s the pipeline you actually want:

Diagram

Evidence-backed citations (fail closed)

Implementation example (real code)

The safest pattern we’ve found:

treat “sources” as IDs, not URLs
only allow citations that refer to snapshotted tool outputs
optionally: require a short quote/excerpt hash per citation

PythonJS

PYTHON

from __future__ import annotations

from dataclasses import dataclass
import hashlib
import time
from typing import Any


@dataclass(frozen=True)
class Evidence:
  source_id: str
  url: str
  fetched_at: float
  title: str
  text_sha256: str


class EvidenceStore:
  def __init__(self) -> None:
      self._items: dict[str, Evidence] = {}

  def add(self, *, url: str, title: str, text: str) -> str:
      sha = hashlib.sha256(text.encode("utf-8")).hexdigest()
      source_id = f"src_{len(self._items)+1:03d}"
      self._items[source_id] = Evidence(
          source_id=source_id,
          url=url,
          fetched_at=time.time(),
          title=title,
          text_sha256=sha,
      )
      return source_id

  def has(self, source_id: str) -> bool:
      return source_id in self._items

  def meta(self, source_id: str) -> Evidence:
      return self._items[source_id]


def verify_citations(*, cited_source_ids: list[str], store: EvidenceStore) -> None:
  missing = [s for s in cited_source_ids if not store.has(s)]
  if missing:
      raise ValueError(f"invalid citations (unknown source_ids): {missing}")


def answer_with_citations(task: str, *, store: EvidenceStore) -> dict[str, Any]:
  # In real code: the model returns structured output.
  # Example shape:
  # { "answer": "...", "citations": ["src_001", "src_002"] }
  out = llm_answer(task)  # (pseudo)
  verify_citations(cited_source_ids=out["citations"], store=store)
  return out


def render_sources(cited_ids: list[str], store: EvidenceStore) -> list[dict[str, str]]:
  sources: list[dict[str, str]] = []
  for sid in cited_ids:
      ev = store.meta(sid)
      sources.append(
          {
              "source_id": sid,
              "title": ev.title,
              "url": ev.url,
              "sha256": ev.text_sha256[:12],
          }
      )
  return sources

JAVASCRIPT

import crypto from "node:crypto";

export class EvidenceStore {
constructor() {
  this.items = new Map();
}

add({ url, title, text }) {
  const sha = crypto.createHash("sha256").update(text, "utf8").digest("hex");
  const sourceId = "src_" + String(this.items.size + 1).padStart(3, "0");
  this.items.set(sourceId, { sourceId, url, title, fetchedAt: Date.now(), textSha256: sha });
  return sourceId;
}

has(sourceId) {
  return this.items.has(sourceId);
}

meta(sourceId) {
  const ev = this.items.get(sourceId);
  if (!ev) throw new Error("unknown source_id: " + sourceId);
  return ev;
}
}

export function verifyCitations({ citedSourceIds, store }) {
const missing = citedSourceIds.filter((s) => !store.has(s));
if (missing.length) throw new Error("invalid citations (unknown source_ids): " + missing.join(", "));
}

export function answerWithCitations(task, { store }) {
// Real code: the model returns structured output validated by schema.
// Example shape:
// { answer: "...", citations: ["src_001", "src_002"] }
const out = llmAnswer(task); // (pseudo)
verifyCitations({ citedSourceIds: out.citations || [], store });
return out;
}

export function renderSources(citedIds, store) {
return citedIds.map((sid) => {
  const ev = store.meta(sid);
  return { source_id: sid, title: ev.title, url: ev.url, sha256: ev.textSha256.slice(0, 12) };
});
}

What this buys you:

citations can’t point to imaginary URLs
you can reproduce answers later (“here’s the snapshot hash”)
you can fail closed when citations don’t verify

If you want to go further, require an excerpt hash (or exact quote) per claim. It’s slower. It’s also harder to fake.

Example incident (numbers are illustrative)

Example: an “internal research agent” generating weekly competitive summaries. It was asked to “include sources”.

What actually happened:

it cited a handful of credible-looking URLs
those URLs were not fetched by the agent
two of the links were dead
one was a completely unrelated press release

Impact:

a PM forwarded the doc to a partner (yikes)
we spent ~6 engineer-hours reconstructing which tool calls happened
we lost trust for a month (“cool demo, but I can’t use it”)

Fix:

sources became source_ids tied to tool snapshots
“search results” stopped counting as evidence
answers without verified citations degraded to: “I can’t cite this reliably”

Dry lesson: if you don’t store evidence, you don’t have citations.

Trade-offs

Evidence snapshots cost storage and time.
Fail-closed citation verification reduces “answer rate” early on.
For some tasks, citations are unnecessary overhead (don’t force it everywhere).

When NOT to use

If the output is internal-only and doesn’t need citations, don’t add them “for vibes”.
If your system can’t fetch/store evidence safely (PII, secrets), don’t pretend citations are reliable.
If you’re doing deterministic lookups from a single source of truth, just link the source directly.

Copy-paste checklist

[ ] Treat citations as source_ids, not URLs
[ ] Store tool output snapshots (URL + hash + timestamp)
[ ] Disallow citations to unfetched URLs
[ ] Separate “search results” from “evidence”
[ ] Validate citations (fail closed or degrade)
[ ] Log run_id + source_ids + snapshot hashes
[ ] Add a retention policy for snapshots
[ ] Add a safe-mode: “answer without sources” if evidence isn’t available

Safe default config snippet (JSON/YAML)

YAML

citations:
  required: true
  evidence_sources: ["http.get", "kb.read"]
  allow_search_results_as_evidence: false
  fail_closed: true
  attach_snapshot_hash: true
  retention_days: 14

FAQ (3–5)

Used by patterns

Related failures

Governance required

Can’t I just tell the model to cite sources?

You can, but it’s not enforceable. Without a verifier tied to tool snapshots, citations are decoration.

Do I need to store full page text?

Not always. Start with URL + title + content hash + timestamp. Store full text if you need quotes or replay.

Are search results ever acceptable evidence?

Only if you’re comfortable citing things you didn’t read. In production: usually no.

What about private docs?

Same pattern. Use `source_id`s tied to `kb.read` snapshots. Don’t leak raw text into logs.

Foundations: How agents use tools · How LLM limits affect agents
Failure: Prompt injection attacks · Infinite loop
Governance: Tool permissions (allowlists)
Production stack: Production agent stack

Not sure this is your use case?

Design your agent ->