ReAct Agent β€” Python (full implementation with LLM)

Production-style runnable ReAct agent example in Python with action schema, tool allowlist, budgets, loop detection, and stop reasons.
On this page
  1. Pattern Essence (Brief)
  2. What this example demonstrates
  3. Architecture
  4. Project structure
  5. How to run
  6. Task
  7. Solution
  8. Code
  9. tools.py β€” tools (source of facts)
  10. gateway.py β€” policy boundary (the most important layer)
  11. llm.py β€” decision step (Think)
  12. main.py β€” full ReAct loop
  13. requirements.txt
  14. Example output
  15. Why this is ReAct and not just tool calling
  16. Typical stop_reason values
  17. What is NOT shown here
  18. What to try next
  19. Full code on GitHub

Pattern Essence (Brief)

ReAct Agent is a pattern where the agent works iteratively: thinks, chooses an action, executes it, and analyzes the result before the next step.

The model makes decisions at each iteration, while tool execution goes through a controlled gateway with action validation, allowlist, and runtime budgets.


What this example demonstrates

  • full Think -> Act -> Observe loop
  • separate policy boundary between decision (LLM) and tools (execution layer)
  • strict action format: only tool or final
  • tool allowlist (deny by default)
  • run budgets: max_steps, max_tool_calls, max_seconds
  • loop detection for repeated calls of the same tool with the same args
  • explicit stop_reason values for debugging and production monitoring

Architecture

  1. LLM receives the goal + step history and returns a JSON action.
  2. The system validates the action (validate_action).
  3. If it is a tool, ToolGateway checks allowlist/budgets/loop detection and executes the tool.
  4. Observation is added to history and becomes new evidence for the next decision step.
  5. If it is final, the run ends with stop_reason="success".

LLM returns intent (JSON action), treated as untrusted input: the policy boundary validates it first and then (if allowed) calls tools.

This keeps ReAct controllable: the model makes decisions, and policy logic controls execution.


Project structure

TEXT
examples/
└── agent-patterns/
    └── react-agent/
        └── python/
            β”œβ”€β”€ main.py           # ReAct loop
            β”œβ”€β”€ llm.py            # LLM decision step (JSON action)
            β”œβ”€β”€ gateway.py        # policy boundary: validation, allowlist, budgets, loop detection
            β”œβ”€β”€ tools.py          # deterministic tools (Anna/Max, USD, policies)
            └── requirements.txt

How to run

BASH
git clone https://github.com/AgentPatterns-tech/agentpatterns.git
cd agentpatterns

cd examples/agent-patterns/react-agent/python
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Python 3.11+ is required.

Option via export:

BASH
export OPENAI_API_KEY="sk-..."
# optional:
# export OPENAI_MODEL="gpt-4.1-mini"
# export OPENAI_TIMEOUT_SECONDS="60"

python main.py
Option via .env (optional)
BASH
cat > .env <<'EOF'
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4.1-mini
OPENAI_TIMEOUT_SECONDS=60
EOF

set -a
source .env
set +a

python main.py

This is the shell variant (macOS/Linux). On Windows it is easier to use environment set commands or, if desired, python-dotenv to load .env automatically.


Task

Imagine a user writes to support:

"Can I get a refund for my subscription right now?"

The agent should not answer immediately. It must:

  • gather facts through tools (profile, billing, policy)
  • decide what to do next after each step
  • provide the final answer only when facts are sufficient

Solution

Here the agent works step by step (ReAct):

  • at each step, the model chooses: call a tool or finish
  • the system checks that the action is valid and allowed
  • the tool returns a result that is added to history
  • based on this history, the agent makes the next step
  • when data is sufficient, the agent returns a short final answer

Code

tools.py β€” tools (source of facts)

PYTHON
from __future__ import annotations

from typing import Any

USERS = {
    42: {"id": 42, "name": "Anna", "country": "US", "tier": "pro"},
    7: {"id": 7, "name": "Max", "country": "US", "tier": "free"},
}

BILLING = {
    42: {
        "currency": "USD",
        "plan": "pro_monthly",
        "price_usd": 49.0,
        "days_since_first_payment": 10,
    },
    7: {
        "currency": "USD",
        "plan": "free",
        "price_usd": 0.0,
        "days_since_first_payment": 120,
    },
}

POLICY_DOCS = [
    {
        "id": "refund-v3",
        "title": "Refund Policy",
        "snippet": "Pro monthly subscriptions are refundable within 14 days from the first payment.",
    },
    {
        "id": "free-v1",
        "title": "Free Plan Policy",
        "snippet": "Free plan has no billable payments and cannot be refunded.",
    },
    {
        "id": "billing-v2",
        "title": "Billing Rules",
        "snippet": "All refunds are returned to the original payment method in USD.",
    },
]


def get_user_profile(user_id: int) -> dict[str, Any]:
    user = USERS.get(user_id)
    if not user:
        return {"error": f"user {user_id} not found"}
    return {"user": user}


def get_user_billing(user_id: int) -> dict[str, Any]:
    billing = BILLING.get(user_id)
    if not billing:
        return {"error": f"billing record for user {user_id} not found"}
    return {"billing": billing}


def search_policy(query: str) -> dict[str, Any]:
    words = [w for w in query.lower().split() if w]

    def score(doc: dict[str, str]) -> int:
        haystack = f"{doc['title']} {doc['snippet']}".lower()
        return sum(1 for word in words if word in haystack)

    ranked = sorted(POLICY_DOCS, key=score, reverse=True)
    top = [doc for doc in ranked if score(doc) > 0][:2]
    if not top:
        top = POLICY_DOCS[:1]

    return {"matches": top}

What matters most here (plain words)

  • Tools are deterministic and contain no LLM logic. The agent only decides which tool to call, but does not execute business logic itself.

gateway.py β€” policy boundary (the most important layer)

PYTHON
from __future__ import annotations

import hashlib
import json
from dataclasses import dataclass
from typing import Any, Callable


class StopRun(Exception):
    def __init__(self, reason: str):
        super().__init__(reason)
        self.reason = reason


@dataclass(frozen=True)
class Budget:
    max_steps: int = 8
    max_tool_calls: int = 6
    max_seconds: int = 20


def _stable_json(value: Any) -> str:
    if value is None or isinstance(value, (bool, int, float, str)):
        return json.dumps(value, ensure_ascii=True, sort_keys=True)
    if isinstance(value, list):
        return "[" + ",".join(_stable_json(item) for item in value) + "]"
    if isinstance(value, dict):
        parts = []
        for key in sorted(value):
            parts.append(
                json.dumps(str(key), ensure_ascii=True) + ":" + _stable_json(value[key])
            )
        return "{" + ",".join(parts) + "}"
    return json.dumps(str(value), ensure_ascii=True)


def args_hash(args: dict[str, Any]) -> str:
    raw = _stable_json(args or {})
    return hashlib.sha256(raw.encode("utf-8")).hexdigest()[:12]


def validate_action(action: Any) -> dict[str, Any]:
    if not isinstance(action, dict):
        raise StopRun("invalid_action:not_object")

    kind = action.get("kind")
    if kind == "invalid":
        raise StopRun("invalid_action:bad_json")
    if kind not in {"tool", "final"}:
        raise StopRun("invalid_action:bad_kind")

    if kind == "final":
        allowed = {"kind", "answer"}
        extra = set(action.keys()) - allowed
        if extra:
            raise StopRun("invalid_action:extra_keys")
        answer = action.get("answer")
        if not isinstance(answer, str) or not answer.strip():
            raise StopRun("invalid_action:missing_answer")
        return {"kind": "final", "answer": answer.strip()}

    allowed = {"kind", "name", "args"}
    extra = set(action.keys()) - allowed
    if extra:
        raise StopRun("invalid_action:extra_keys")

    name = action.get("name")
    if not isinstance(name, str) or not name:
        raise StopRun("invalid_action:missing_tool_name")

    args = action.get("args", {})
    if args is None:
        args = {}
    if not isinstance(args, dict):
        raise StopRun("invalid_action:bad_args")

    return {"kind": "tool", "name": name, "args": args}


class ToolGateway:
    def __init__(
        self,
        *,
        allow: set[str],
        registry: dict[str, Callable[..., dict[str, Any]]],
        budget: Budget,
    ):
        self.allow = set(allow)
        self.registry = registry
        self.budget = budget
        self.tool_calls = 0
        self.seen_calls: set[str] = set()

    def call(self, name: str, args: dict[str, Any]) -> dict[str, Any]:
        self.tool_calls += 1
        if self.tool_calls > self.budget.max_tool_calls:
            raise StopRun("max_tool_calls")

        if name not in self.allow:
            raise StopRun(f"tool_denied:{name}")

        tool = self.registry.get(name)
        if tool is None:
            raise StopRun(f"tool_missing:{name}")

        signature = f"{name}:{args_hash(args)}"
        if signature in self.seen_calls:
            raise StopRun("loop_detected")
        self.seen_calls.add(signature)

        try:
            return tool(**args)
        except TypeError as exc:
            raise StopRun(f"tool_bad_args:{name}") from exc
        except Exception as exc:
            raise StopRun(f"tool_error:{name}") from exc

What matters most here (plain words)

  • validate_action(...) is the governance/control layer: the system accepts only the allowed action contract and rejects extra fields (invalid_action:extra_keys).
  • Budget + StopRun(...) is a production pattern for controlled shutdown: runs do not drift indefinitely, they stop with a clear reason.
  • ToolGateway.call(...) is the boundary of agent β‰  executor: the agent only proposes an action, while the actual tool call is done by a controlled system layer.
  • loop_detected catches exact repeats (same tool + same args). Semantic loop detection is a separate option (see β€œWhat to try next”).

llm.py β€” decision step (Think)

LLM sees only the catalog of available tools; if a tool is not in the allowlist, gateway stops the run.

PYTHON
from __future__ import annotations

import json
import os
from typing import Any

from openai import APIConnectionError, APITimeoutError, OpenAI

MODEL = os.getenv("OPENAI_MODEL", "gpt-4.1-mini")
LLM_TIMEOUT_SECONDS = float(os.getenv("OPENAI_TIMEOUT_SECONDS", "60"))


class LLMTimeout(Exception):
    pass

SYSTEM_PROMPT = """
You are a ReAct decision engine.
Return only one JSON object with one of these shapes:
1) {"kind":"tool","name":"<tool_name>","args":{...}}
2) {"kind":"final","answer":"<short final answer>"}

Rules:
- Use tools when you do not have enough facts.
- Do not invent tool outputs.
- Prefer the smallest next step.
- When evidence is sufficient, return "final".
- Never output markdown or extra keys.
""".strip()

TOOL_CATALOG = [
    {
        "name": "get_user_profile",
        "description": "Get user profile by user_id",
        "args": {"user_id": "integer"},
    },
    {
        "name": "get_user_billing",
        "description": "Get billing info by user_id",
        "args": {"user_id": "integer"},
    },
    {
        "name": "search_policy",
        "description": "Search refund and billing policy snippets",
        "args": {"query": "string"},
    },
]


def _get_client() -> OpenAI:
    api_key = os.getenv("OPENAI_API_KEY")
    if not api_key:
        raise EnvironmentError(
            "OPENAI_API_KEY is not set. Run: export OPENAI_API_KEY='sk-...'"
        )
    return OpenAI(api_key=api_key)


def _build_state_summary(history: list[dict[str, Any]]) -> dict[str, Any]:
    tools_used = [
        step.get("action", {}).get("name")
        for step in history
        if isinstance(step, dict)
        and isinstance(step.get("action"), dict)
        and step.get("action", {}).get("kind") == "tool"
    ]
    last_observation = history[-1].get("observation") if history else None
    return {
        "steps_completed": len(history),
        "tools_used": tools_used,
        "last_observation": last_observation,
    }


def decide_next_action(goal: str, history: list[dict[str, Any]]) -> dict[str, Any]:
    # Keep full history in memory, but send summary + last N steps
    # so the prompt remains stable as runs get longer.
    recent_history = history[-3:]
    payload = {
        "goal": goal,
        "state_summary": _build_state_summary(history),
        "recent_history": recent_history,
        "available_tools": TOOL_CATALOG,
    }

    client = _get_client()
    try:
        completion = client.chat.completions.create(
            model=MODEL,
            temperature=0,
            timeout=LLM_TIMEOUT_SECONDS,
            response_format={"type": "json_object"},
            messages=[
                {"role": "system", "content": SYSTEM_PROMPT},
                {"role": "user", "content": json.dumps(payload, ensure_ascii=True)},
            ],
        )
    except (APITimeoutError, APIConnectionError) as exc:
        raise LLMTimeout("llm_timeout") from exc

    text = completion.choices[0].message.content or "{}"
    try:
        return json.loads(text)
    except json.JSONDecodeError:
        return {"kind": "invalid", "raw": text}

What matters most here (plain words)

  • timeout=LLM_TIMEOUT_SECONDS + LLMTimeout is a production pattern: if the model hangs, the run returns explicit llm_timeout.
  • state_summary + recent_history is a production scaling pattern: context grows in a controlled way, not without limits at each step.
  • SYSTEM_PROMPT defines only the intent format (tool/final) - LLM decides what to do, but does not execute tools by itself.

main.py β€” full ReAct loop

PYTHON
from __future__ import annotations

import json
import time
from typing import Any

from gateway import Budget, StopRun, ToolGateway, args_hash, validate_action
from llm import LLMTimeout, decide_next_action
from tools import get_user_billing, get_user_profile, search_policy

GOAL = (
    "User 42 asked: Can I get a refund now? "
    "Use tools to verify profile, billing state, and policy. "
    "Return a short final answer in English with USD amount and reason."
)

BUDGET = Budget(max_steps=8, max_tool_calls=5, max_seconds=20)

TOOL_REGISTRY = {
    "get_user_profile": get_user_profile,
    "get_user_billing": get_user_billing,
    "search_policy": search_policy,
}

ALLOWED_TOOLS = {"get_user_profile", "get_user_billing", "search_policy"}


def run_react(goal: str) -> dict[str, Any]:
    started = time.monotonic()
    history: list[dict[str, Any]] = []
    trace: list[dict[str, Any]] = []

    gateway = ToolGateway(allow=ALLOWED_TOOLS, registry=TOOL_REGISTRY, budget=BUDGET)

    for step in range(1, BUDGET.max_steps + 1):
        elapsed = time.monotonic() - started
        if elapsed > BUDGET.max_seconds:
            return {
                "status": "stopped",
                "stop_reason": "max_seconds",
                "trace": trace,
                "history": history,
            }

        try:
            raw_action = decide_next_action(goal=goal, history=history)
        except LLMTimeout:
            return {
                "status": "stopped",
                "stop_reason": "llm_timeout",
                "trace": trace,
                "history": history,
            }

        try:
            action = validate_action(raw_action)
        except StopRun as exc:
            return {
                "status": "stopped",
                "stop_reason": exc.reason,
                "raw_action": raw_action,
                "trace": trace,
                "history": history,
            }

        if action["kind"] == "final":
            return {
                "status": "ok",
                "stop_reason": "success",
                "answer": action["answer"],
                "trace": trace,
                "history": history,
            }

        tool_name = action["name"]
        tool_args = action["args"]

        try:
            observation = gateway.call(tool_name, tool_args)
            trace.append(
                {
                    "step": step,
                    "tool": tool_name,
                    "args_hash": args_hash(tool_args),
                    "ok": True,
                }
            )
        except StopRun as exc:
            trace.append(
                {
                    "step": step,
                    "tool": tool_name,
                    "args_hash": args_hash(tool_args),
                    "ok": False,
                    "stop_reason": exc.reason,
                }
            )
            return {
                "status": "stopped",
                "stop_reason": exc.reason,
                "trace": trace,
                "history": history,
            }

        history.append(
            {
                "step": step,
                "action": action,
                "observation": observation,
            }
        )

    return {
        "status": "stopped",
        "stop_reason": "max_steps",
        "trace": trace,
        "history": history,
    }


def main() -> None:
    result = run_react(GOAL)
    print(json.dumps(result, indent=2, ensure_ascii=False))


if __name__ == "__main__":
    main()

What matters most here (plain words)

  • run_react(...) controls the loop and stop conditions; business actions are executed only through ToolGateway.
  • validate_action(...) and gateway.call(...) inside the loop are the governance/control layer in action at every step.
  • Splitting decide_next_action(...) and gateway.call(...) is the key principle of agent β‰  executor: the agent returns intent, and tools are called only through the policy boundary.

requirements.txt

TEXT
openai==2.21.0

Example output

Tool call order may vary slightly between runs, but stop_reason and policy gates (allowlist, budget, validation) remain stable.

JSON
{
  "status": "ok",
  "stop_reason": "success",
  "answer": "Yes, you can get a refund of USD 49.00 ...",
  "trace": [
    {"step": 1, "tool": "get_user_profile", "args_hash": "...", "ok": true},
    {"step": 2, "tool": "get_user_billing", "args_hash": "...", "ok": true},
    {"step": 3, "tool": "search_policy", "args_hash": "...", "ok": true}
  ],
  "history": [{...}]
}

history is the step execution log: for each step, it stores action (what the agent decided to do) and observation (what the tool returned).

args_hash is an arguments hash, so for the same user_id it may match across different tools; loop guard checks the tool + args_hash combination.


Why this is ReAct and not just tool calling

Single callReAct loop
Decision after each observationβŒβœ…
Explicit stop reasonsβŒβœ…
Control of repeated identical actionsβŒβœ…
Run budget (steps/tools/time)partialβœ…

Typical stop_reason values

  • success β€” the agent returned a final answer
  • max_steps β€” step budget exhausted
  • max_tool_calls β€” tool call limit exhausted
  • max_seconds β€” time budget exceeded
  • llm_timeout β€” LLM did not reply within OPENAI_TIMEOUT_SECONDS
  • loop_detected β€” same tool call with the same args repeated
  • tool_denied:<name> β€” tool is not in allowlist
  • invalid_action:* β€” model returned an invalid action structure

What is NOT shown here

  • No auth/PII and production access controls for personal data.
  • No retry/backoff policies for LLM and tool layer.
  • No token/cost budgets (cost guardrails).
  • Tools here are deterministic learning mocks, not real external APIs.

What to try next

  • Remove search_policy from ALLOWED_TOOLS and observe how stop_reason changes.
  • Set max_tool_calls=1 and verify policy stops the agent, not the model.
  • Change GOAL to user_id=7 (Max) and validate the final answer.
  • Try calling a missing tool (the model sometimes does this) - you will see tool_missing:*.
  • Add a soft-loop mode: normalize string args (trim + collapse spaces) before hashing to catch semantically identical repeats.
  • Add JSONL step logs (trace) for production observability.

Full code on GitHub

The repository contains the full runnable version of this example: ReAct loop, policy boundary, allowlist, budgets, loop detection, and stop reasons.

View full code on GitHub β†—
⏱️ 13 min read β€’ Updated Mar, 2026Difficulty: β˜…β˜…β˜†
Integrated: production controlOnceOnly
Add guardrails to tool-calling agents
Ship this pattern with governance:
  • Budgets (steps / spend caps)
  • Tool permissions (allowlist / blocklist)
  • Kill switch & incident stop
  • Idempotency & dedupe
  • Audit logs & traceability
Integrated mention: OnceOnly is a control layer for production agent systems.
Author

This documentation is curated and maintained by engineers who ship AI agents in production.

The content is AI-assisted, with human editorial responsibility for accuracy, clarity, and production relevance.

Patterns and recommendations are grounded in post-mortems, failure modes, and operational incidents in deployed systems, including during the development and operation of governance infrastructure for agents at OnceOnly.