ReAct Agent — Python (full implementation with LLM)

Pattern Essence (Brief)

ReAct Agent is a pattern where the agent works iteratively: thinks, chooses an action, executes it, and analyzes the result before the next step.

The model makes decisions at each iteration, while tool execution goes through a controlled gateway with action validation, allowlist, and runtime budgets.

Learn More About ReAct Agent

What this example demonstrates

full Think -> Act -> Observe loop
separate policy boundary between decision (LLM) and tools (execution layer)
strict action format: only tool or final
tool allowlist (deny by default)
run budgets: max_steps, max_tool_calls, max_seconds
loop detection for repeated calls of the same tool with the same args
explicit stop_reason values for debugging and production monitoring

Architecture

LLM receives the goal + step history and returns a JSON action.
The system validates the action (validate_action).
If it is a tool, ToolGateway checks allowlist/budgets/loop detection and executes the tool.
Observation is added to history and becomes new evidence for the next decision step.
If it is final, the run ends with stop_reason="success".

LLM returns intent (JSON action), treated as untrusted input: the policy boundary validates it first and then (if allowed) calls tools.

This keeps ReAct controllable: the model makes decisions, and policy logic controls execution.

Project structure

TEXT

examples/
└── agent-patterns/
    └── react-agent/
        └── python/
            ├── main.py           # ReAct loop
            ├── llm.py            # LLM decision step (JSON action)
            ├── gateway.py        # policy boundary: validation, allowlist, budgets, loop detection
            ├── tools.py          # deterministic tools (Anna/Max, USD, policies)
            └── requirements.txt

How to run

BASH

git clone https://github.com/AgentPatterns-tech/agentpatterns.git
cd agentpatterns

cd examples/agent-patterns/react-agent/python
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Python 3.11+ is required.

Option via export:

BASH

export OPENAI_API_KEY="sk-..."
# optional:
# export OPENAI_MODEL="gpt-4.1-mini"
# export OPENAI_TIMEOUT_SECONDS="60"

python main.py

Option via .env (optional)

BASH

cat > .env <<'EOF'
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4.1-mini
OPENAI_TIMEOUT_SECONDS=60
EOF

set -a
source .env
set +a

python main.py

This is the shell variant (macOS/Linux). On Windows it is easier to use environment set commands or, if desired, python-dotenv to load .env automatically.

Task

Imagine a user writes to support:

"Can I get a refund for my subscription right now?"

The agent should not answer immediately. It must:

gather facts through tools (profile, billing, policy)
decide what to do next after each step
provide the final answer only when facts are sufficient

Solution

Here the agent works step by step (ReAct):

at each step, the model chooses: call a tool or finish
the system checks that the action is valid and allowed
the tool returns a result that is added to history
based on this history, the agent makes the next step
when data is sufficient, the agent returns a short final answer

Code

`tools.py` — tools (source of facts)

PYTHON

from __future__ import annotations

from typing import Any

USERS = {
    42: {"id": 42, "name": "Anna", "country": "US", "tier": "pro"},
    7: {"id": 7, "name": "Max", "country": "US", "tier": "free"},
}

BILLING = {
    42: {
        "currency": "USD",
        "plan": "pro_monthly",
        "price_usd": 49.0,
        "days_since_first_payment": 10,
    },
    7: {
        "currency": "USD",
        "plan": "free",
        "price_usd": 0.0,
        "days_since_first_payment": 120,
    },
}

POLICY_DOCS = [
    {
        "id": "refund-v3",
        "title": "Refund Policy",
        "snippet": "Pro monthly subscriptions are refundable within 14 days from the first payment.",
    },
    {
        "id": "free-v1",
        "title": "Free Plan Policy",
        "snippet": "Free plan has no billable payments and cannot be refunded.",
    },
    {
        "id": "billing-v2",
        "title": "Billing Rules",
        "snippet": "All refunds are returned to the original payment method in USD.",
    },
]


def get_user_profile(user_id: int) -> dict[str, Any]:
    user = USERS.get(user_id)
    if not user:
        return {"error": f"user {user_id} not found"}
    return {"user": user}


def get_user_billing(user_id: int) -> dict[str, Any]:
    billing = BILLING.get(user_id)
    if not billing:
        return {"error": f"billing record for user {user_id} not found"}
    return {"billing": billing}


def search_policy(query: str) -> dict[str, Any]:
    words = [w for w in query.lower().split() if w]

    def score(doc: dict[str, str]) -> int:
        haystack = f"{doc['title']} {doc['snippet']}".lower()
        return sum(1 for word in words if word in haystack)

    ranked = sorted(POLICY_DOCS, key=score, reverse=True)
    top = [doc for doc in ranked if score(doc) > 0][:2]
    if not top:
        top = POLICY_DOCS[:1]

    return {"matches": top}

What matters most here (plain words)

Tools are deterministic and contain no LLM logic. The agent only decides which tool to call, but does not execute business logic itself.

`gateway.py` — policy boundary (the most important layer)

PYTHON

from __future__ import annotations

import hashlib
import json
from dataclasses import dataclass
from typing import Any, Callable


class StopRun(Exception):
    def __init__(self, reason: str):
        super().__init__(reason)
        self.reason = reason


@dataclass(frozen=True)
class Budget:
    max_steps: int = 8
    max_tool_calls: int = 6
    max_seconds: int = 20


def _stable_json(value: Any) -> str:
    if value is None or isinstance(value, (bool, int, float, str)):
        return json.dumps(value, ensure_ascii=True, sort_keys=True)
    if isinstance(value, list):
        return "[" + ",".join(_stable_json(item) for item in value) + "]"
    if isinstance(value, dict):
        parts = []
        for key in sorted(value):
            parts.append(
                json.dumps(str(key), ensure_ascii=True) + ":" + _stable_json(value[key])
            )
        return "{" + ",".join(parts) + "}"
    return json.dumps(str(value), ensure_ascii=True)


def args_hash(args: dict[str, Any]) -> str:
    raw = _stable_json(args or {})
    return hashlib.sha256(raw.encode("utf-8")).hexdigest()[:12]


def validate_action(action: Any) -> dict[str, Any]:
    if not isinstance(action, dict):
        raise StopRun("invalid_action:not_object")

    kind = action.get("kind")
    if kind == "invalid":
        raise StopRun("invalid_action:bad_json")
    if kind not in {"tool", "final"}:
        raise StopRun("invalid_action:bad_kind")

    if kind == "final":
        allowed = {"kind", "answer"}
        extra = set(action.keys()) - allowed
        if extra:
            raise StopRun("invalid_action:extra_keys")
        answer = action.get("answer")
        if not isinstance(answer, str) or not answer.strip():
            raise StopRun("invalid_action:missing_answer")
        return {"kind": "final", "answer": answer.strip()}

    allowed = {"kind", "name", "args"}
    extra = set(action.keys()) - allowed
    if extra:
        raise StopRun("invalid_action:extra_keys")

    name = action.get("name")
    if not isinstance(name, str) or not name:
        raise StopRun("invalid_action:missing_tool_name")

    args = action.get("args", {})
    if args is None:
        args = {}
    if not isinstance(args, dict):
        raise StopRun("invalid_action:bad_args")

    return {"kind": "tool", "name": name, "args": args}


class ToolGateway:
    def __init__(
        self,
        *,
        allow: set[str],
        registry: dict[str, Callable[..., dict[str, Any]]],
        budget: Budget,
    ):
        self.allow = set(allow)
        self.registry = registry
        self.budget = budget
        self.tool_calls = 0
        self.seen_calls: set[str] = set()

    def call(self, name: str, args: dict[str, Any]) -> dict[str, Any]:
        self.tool_calls += 1
        if self.tool_calls > self.budget.max_tool_calls:
            raise StopRun("max_tool_calls")

        if name not in self.allow:
            raise StopRun(f"tool_denied:{name}")

        tool = self.registry.get(name)
        if tool is None:
            raise StopRun(f"tool_missing:{name}")

        signature = f"{name}:{args_hash(args)}"
        if signature in self.seen_calls:
            raise StopRun("loop_detected")
        self.seen_calls.add(signature)

        try:
            return tool(**args)
        except TypeError as exc:
            raise StopRun(f"tool_bad_args:{name}") from exc
        except Exception as exc:
            raise StopRun(f"tool_error:{name}") from exc

What matters most here (plain words)

validate_action(...) is the governance/control layer: the system accepts only the allowed action contract and rejects extra fields (invalid_action:extra_keys).
Budget + StopRun(...) is a production pattern for controlled shutdown: runs do not drift indefinitely, they stop with a clear reason.
ToolGateway.call(...) is the boundary of agent ≠ executor: the agent only proposes an action, while the actual tool call is done by a controlled system layer.
loop_detected catches exact repeats (same tool + same args). Semantic loop detection is a separate option (see “What to try next”).

`llm.py` — decision step (Think)

LLM sees only the catalog of available tools; if a tool is not in the allowlist, gateway stops the run.

PYTHON

from __future__ import annotations

import json
import os
from typing import Any

from openai import APIConnectionError, APITimeoutError, OpenAI

MODEL = os.getenv("OPENAI_MODEL", "gpt-4.1-mini")
LLM_TIMEOUT_SECONDS = float(os.getenv("OPENAI_TIMEOUT_SECONDS", "60"))


class LLMTimeout(Exception):
    pass

SYSTEM_PROMPT = """
You are a ReAct decision engine.
Return only one JSON object with one of these shapes:
1) {"kind":"tool","name":"<tool_name>","args":{...}}
2) {"kind":"final","answer":"<short final answer>"}

Rules:
- Use tools when you do not have enough facts.
- Do not invent tool outputs.
- Prefer the smallest next step.
- When evidence is sufficient, return "final".
- Never output markdown or extra keys.
""".strip()

TOOL_CATALOG = [
    {
        "name": "get_user_profile",
        "description": "Get user profile by user_id",
        "args": {"user_id": "integer"},
    },
    {
        "name": "get_user_billing",
        "description": "Get billing info by user_id",
        "args": {"user_id": "integer"},
    },
    {
        "name": "search_policy",
        "description": "Search refund and billing policy snippets",
        "args": {"query": "string"},
    },
]


def _get_client() -> OpenAI:
    api_key = os.getenv("OPENAI_API_KEY")
    if not api_key:
        raise EnvironmentError(
            "OPENAI_API_KEY is not set. Run: export OPENAI_API_KEY='sk-...'"
        )
    return OpenAI(api_key=api_key)


def _build_state_summary(history: list[dict[str, Any]]) -> dict[str, Any]:
    tools_used = [
        step.get("action", {}).get("name")
        for step in history
        if isinstance(step, dict)
        and isinstance(step.get("action"), dict)
        and step.get("action", {}).get("kind") == "tool"
    ]
    last_observation = history[-1].get("observation") if history else None
    return {
        "steps_completed": len(history),
        "tools_used": tools_used,
        "last_observation": last_observation,
    }


def decide_next_action(goal: str, history: list[dict[str, Any]]) -> dict[str, Any]:
    # Keep full history in memory, but send summary + last N steps
    # so the prompt remains stable as runs get longer.
    recent_history = history[-3:]
    payload = {
        "goal": goal,
        "state_summary": _build_state_summary(history),
        "recent_history": recent_history,
        "available_tools": TOOL_CATALOG,
    }

    client = _get_client()
    try:
        completion = client.chat.completions.create(
            model=MODEL,
            temperature=0,
            timeout=LLM_TIMEOUT_SECONDS,
            response_format={"type": "json_object"},
            messages=[
                {"role": "system", "content": SYSTEM_PROMPT},
                {"role": "user", "content": json.dumps(payload, ensure_ascii=True)},
            ],
        )
    except (APITimeoutError, APIConnectionError) as exc:
        raise LLMTimeout("llm_timeout") from exc

    text = completion.choices[0].message.content or "{}"
    try:
        return json.loads(text)
    except json.JSONDecodeError:
        return {"kind": "invalid", "raw": text}

What matters most here (plain words)

timeout=LLM_TIMEOUT_SECONDS + LLMTimeout is a production pattern: if the model hangs, the run returns explicit llm_timeout.
state_summary + recent_history is a production scaling pattern: context grows in a controlled way, not without limits at each step.
SYSTEM_PROMPT defines only the intent format (tool/final) - LLM decides what to do, but does not execute tools by itself.

`main.py` — full ReAct loop

PYTHON

from __future__ import annotations

import json
import time
from typing import Any

from gateway import Budget, StopRun, ToolGateway, args_hash, validate_action
from llm import LLMTimeout, decide_next_action
from tools import get_user_billing, get_user_profile, search_policy

GOAL = (
    "User 42 asked: Can I get a refund now? "
    "Use tools to verify profile, billing state, and policy. "
    "Return a short final answer in English with USD amount and reason."
)

BUDGET = Budget(max_steps=8, max_tool_calls=5, max_seconds=20)

TOOL_REGISTRY = {
    "get_user_profile": get_user_profile,
    "get_user_billing": get_user_billing,
    "search_policy": search_policy,
}

ALLOWED_TOOLS = {"get_user_profile", "get_user_billing", "search_policy"}


def run_react(goal: str) -> dict[str, Any]:
    started = time.monotonic()
    history: list[dict[str, Any]] = []
    trace: list[dict[str, Any]] = []

    gateway = ToolGateway(allow=ALLOWED_TOOLS, registry=TOOL_REGISTRY, budget=BUDGET)

    for step in range(1, BUDGET.max_steps + 1):
        elapsed = time.monotonic() - started
        if elapsed > BUDGET.max_seconds:
            return {
                "status": "stopped",
                "stop_reason": "max_seconds",
                "trace": trace,
                "history": history,
            }

        try:
            raw_action = decide_next_action(goal=goal, history=history)
        except LLMTimeout:
            return {
                "status": "stopped",
                "stop_reason": "llm_timeout",
                "trace": trace,
                "history": history,
            }

        try:
            action = validate_action(raw_action)
        except StopRun as exc:
            return {
                "status": "stopped",
                "stop_reason": exc.reason,
                "raw_action": raw_action,
                "trace": trace,
                "history": history,
            }

        if action["kind"] == "final":
            return {
                "status": "ok",
                "stop_reason": "success",
                "answer": action["answer"],
                "trace": trace,
                "history": history,
            }

        tool_name = action["name"]
        tool_args = action["args"]

        try:
            observation = gateway.call(tool_name, tool_args)
            trace.append(
                {
                    "step": step,
                    "tool": tool_name,
                    "args_hash": args_hash(tool_args),
                    "ok": True,
                }
            )
        except StopRun as exc:
            trace.append(
                {
                    "step": step,
                    "tool": tool_name,
                    "args_hash": args_hash(tool_args),
                    "ok": False,
                    "stop_reason": exc.reason,
                }
            )
            return {
                "status": "stopped",
                "stop_reason": exc.reason,
                "trace": trace,
                "history": history,
            }

        history.append(
            {
                "step": step,
                "action": action,
                "observation": observation,
            }
        )

    return {
        "status": "stopped",
        "stop_reason": "max_steps",
        "trace": trace,
        "history": history,
    }


def main() -> None:
    result = run_react(GOAL)
    print(json.dumps(result, indent=2, ensure_ascii=False))


if __name__ == "__main__":
    main()

What matters most here (plain words)

run_react(...) controls the loop and stop conditions; business actions are executed only through ToolGateway.
validate_action(...) and gateway.call(...) inside the loop are the governance/control layer in action at every step.
Splitting decide_next_action(...) and gateway.call(...) is the key principle of agent ≠ executor: the agent returns intent, and tools are called only through the policy boundary.

`requirements.txt`

TEXT

openai==2.21.0

Example output

Tool call order may vary slightly between runs, but stop_reason and policy gates (allowlist, budget, validation) remain stable.

JSON

{
  "status": "ok",
  "stop_reason": "success",
  "answer": "Yes, you can get a refund of USD 49.00 ...",
  "trace": [
    {"step": 1, "tool": "get_user_profile", "args_hash": "...", "ok": true},
    {"step": 2, "tool": "get_user_billing", "args_hash": "...", "ok": true},
    {"step": 3, "tool": "search_policy", "args_hash": "...", "ok": true}
  ],
  "history": [{...}]
}

history is the step execution log: for each step, it stores action (what the agent decided to do) and observation (what the tool returned).

args_hash is an arguments hash, so for the same user_id it may match across different tools; loop guard checks the tool + args_hash combination.

Why this is ReAct and not just tool calling

	Single call	ReAct loop
Decision after each observation	❌	✅
Explicit stop reasons	❌	✅
Control of repeated identical actions	❌	✅
Run budget (steps/tools/time)	partial	✅

Typical `stop_reason` values

success — the agent returned a final answer
max_steps — step budget exhausted
max_tool_calls — tool call limit exhausted
max_seconds — time budget exceeded
llm_timeout — LLM did not reply within OPENAI_TIMEOUT_SECONDS
loop_detected — same tool call with the same args repeated
tool_denied:<name> — tool is not in allowlist
invalid_action:* — model returned an invalid action structure

What is NOT shown here

No auth/PII and production access controls for personal data.
No retry/backoff policies for LLM and tool layer.
No token/cost budgets (cost guardrails).
Tools here are deterministic learning mocks, not real external APIs.

What to try next

Remove search_policy from ALLOWED_TOOLS and observe how stop_reason changes.
Set max_tool_calls=1 and verify policy stops the agent, not the model.
Change GOAL to user_id=7 (Max) and validate the final answer.
Try calling a missing tool (the model sometimes does this) - you will see tool_missing:*.
Add a soft-loop mode: normalize string args (trim + collapse spaces) before hashing to catch semantically identical repeats.
Add JSONL step logs (trace) for production observability.

Full code on GitHub

The repository contains the full runnable version of this example: ReAct loop, policy boundary, allowlist, budgets, loop detection, and stop reasons.

View full code on GitHub ↗

ReAct Agent — Python (full implementation with LLM)

Pattern Essence (Brief)

What this example demonstrates

Architecture

Project structure

How to run

Task

Solution

Code

`tools.py` — tools (source of facts)

What matters most here (plain words)

`gateway.py` — policy boundary (the most important layer)

What matters most here (plain words)

`llm.py` — decision step (Think)

What matters most here (plain words)

`main.py` — full ReAct loop

What matters most here (plain words)

`requirements.txt`

Example output

Why this is ReAct and not just tool calling

Typical `stop_reason` values

What is NOT shown here

What to try next

Full code on GitHub

Used by patterns

Related failures

Governance required

ReAct Agent — Python (full implementation with LLM)

Pattern Essence (Brief)

What this example demonstrates

Architecture

Project structure

How to run

Task

Solution

Code

tools.py — tools (source of facts)

What matters most here (plain words)

gateway.py — policy boundary (the most important layer)

What matters most here (plain words)

llm.py — decision step (Think)

What matters most here (plain words)

main.py — full ReAct loop

What matters most here (plain words)

requirements.txt

Example output

Why this is ReAct and not just tool calling

Typical stop_reason values

What is NOT shown here

What to try next

Full code on GitHub

Used by patterns

Related failures

Governance required

`tools.py` — tools (source of facts)

`gateway.py` — policy boundary (the most important layer)

`llm.py` — decision step (Think)

`main.py` — full ReAct loop

`requirements.txt`

Typical `stop_reason` values