How AI Agents Use Memory

Without this, every new action would be like a first attempt. Sometimes endlessly.

On this page

What agent memory is and what it consists of
Short-term vs long-term memory
What the agent remembers within a task
What the agent can remember between tasks
Memory limit: context window
In code this looks like
1) Short-term memory: what the agent "sees" now
2) Context limit: old items can fall out
3) Long-term memory: what we store between tasks
4) In a new task we read this data back
5) The agent builds a response using memory
Analogy from everyday life
In short
FAQ
What’s next

When an agent executes a task, it does not just react to the current instruction.

It takes into account what already happened before: what you asked, what it already did, and what result it got.

Without this, every new action would be like a first attempt.

It would call the same API again or repeat a step that already failed

— sometimes endlessly.

Memory is exactly what lets an agent move forward instead of going in circles.

What agent memory is and what it consists of

AI agent: Agent memory: what it remembers and why

Agent memory is not one place where all information is stored.

It is a set of mechanisms that let it:

keep context for the current task
and use experience from previous ones

Without them, the agent does not know: what has already been done
what worked
and what to do next

Short-term vs long-term memory

Diagram

Not all agent memory is the same.

There is memory during a task.
And there is memory between tasks.

Short-term exists only during the task.

This is the context of the current conversation:

your instructions
agent responses
results of previous steps

When the task is finished, this context disappears.

Next time, the agent starts with a "clean slate."

Long-term is stored between tasks.

It allows the agent to:

remember preferences
account for previous experience
use data from past tasks

Without it, each new task is like the first one.

A simple example:

You ask the agent: "Make a report like last time."

With short-term memory only, it does not know what "last time" was.
With long-term memory, it knows the format, sources, and structure, and can repeat it.

	Short-term	Long-term
Works during task	✅	❌
Stored between tasks	❌	✅
Has limits	✅	❌
Needs storage	❌	✅

What the agent remembers within a task

When the agent works on a task, it "sees" only the current conversation.

Everything you write.
Everything the agent answers.
Every result it gets from tools.

That is its short-term memory: context.

It uses this to:

Understand what is happening now
Decide which step to take next
And avoid repeating what is already done

But this context is not unlimited.

If the conversation becomes too long, part of older information simply falls out.

The agent no longer sees it.

And it can:

Forget the original requirement
Lose an important detail
Or do an action it already performed earlier

What the agent can remember between tasks

When a task is completed, conversation context disappears.

But that does not mean the agent must forget everything forever.

It can save part of the information in external memory.

This can be:

A database
A file
Or another storage

Where the agent writes:

Preferences
Previous decisions
Or important facts

And on the next task, it can read this data back.

That is how it remembers:

How you work
Which formats you use
Or what it did before

Even if the previous conversation ended long ago.

Memory limit: context window

Agent short-term memory has limits.

It cannot remember the entire conversation in full.

There is a maximum amount of context the model can "see" at once.

This is called the context window.

When the conversation becomes too long, part of old information simply does not fit anymore.

It falls out of context.

The agent stops considering it.

Because of this, it can:

Forget the original requirement
Lose an important detail
Or repeat an action it already performed earlier

In code this looks like

Below is the same principle in a simple format:
there is short-term memory (task context) and long-term memory (external storage between tasks).

1) Short-term memory: what the agent "sees" now

This is current messages and recent step results:

PYTHON

short_memory = [
    {"role": "user", "content": "Prepare a weekly sales report"},
    {"role": "assistant", "content": "Okay, starting data collection"},
    {"role": "tool", "content": "sales_total=12400"},
]

2) Context limit: old items can fall out

If context is limited, the system keeps only recent items:

PYTHON

MAX_ITEMS = 3
short_memory = short_memory[-MAX_ITEMS:]

Because of this, the agent may not see early instructions.

3) Long-term memory: what we store between tasks

Separately, we keep a storage with useful facts:

PYTHON

long_memory_store = {
    "user:anna": {
        "report_format": "short-bullets",
        "currency": "USD",
    }
}

4) In a new task we read this data back

Before responding, the agent retrieves saved preferences:

PYTHON

user_prefs = long_memory_store.get("user:anna", {})

task_context = {
    "request": "Make a report like last time",
    "prefs": user_prefs,
}

5) The agent builds a response using memory

PYTHON

def build_report(context: dict):
    fmt = context["prefs"].get("report_format", "default")
    currency = context["prefs"].get("currency", "USD")
    return f"Report format={fmt}, currency={currency}"


result = build_report(task_context)
# "Report format=short-bullets, currency=USD"

Without long-term memory, this would be format=default.

Full implementation example with connected LLM

PYPython

TSTypeScript · soon

Analogy from everyday life

Imagine you are on a phone call, but you can hear only the last 30 seconds of it.

You know what the other person just said.
You remember the latest response.
And you can continue the conversation.

But if they say:

"As I explained at the beginning..."

— you did not hear that beginning.

It simply dropped out.

And you might:

Ask the same thing again
Misunderstand the task
Or answer off-topic

If you have notes from previous calls, you can read them and restore context.

This is exactly how the agent uses short-term and long-term memory.

In short

Quick take

The agent has two memory types:

Short-term: context of the current task
Long-term: saved data between tasks

Short-term memory is limited:
part of information can disappear from context.

Long-term memory allows it to:
store experience and use it later.

FAQ

Q: Does an agent remember previous tasks?
A: Only if that information is saved in long-term memory outside the current conversation.

Q: Why can an agent forget the original instruction?
A: Because of context window limits: part of older information can drop out of short-term memory.

Q: Why does an agent need long-term memory?
A: To store important data between tasks and use it in the future.

What’s next

Now you know what the agent remembers and how this helps it move forward.

But memory is only part of the picture.

Because the agent does not just remember actions.
It executes them.

And not all actions are equally safe.

One thing is to read data.
Another is to change it.
Or delete it.
Or spend money on API calls.

That is why an agent needs not only to know what to do.
It needs to know what it is allowed to do.

Pattern implementation examples

Continue with implementation using example projects.

Python

Agent Memory in Python: What It Stores and Why (Full Example)

Open example

Used by patterns

Related failures

Governance required

Integrated: production controlOnceOnly

Add guardrails to tool-calling agents

Ship this pattern with governance:

Budgets (steps / spend caps)
Tool permissions (allowlist / blocklist)
Kill switch & incident stop
Idempotency & dedupe
Audit logs & traceability

Try OnceOnly Docs & examples

Integrated mention: OnceOnly is a control layer for production agent systems.

Author

Nick — engineer building infrastructure for production AI agents.

Focus: agent patterns, failure modes, runtime control, and system reliability.

🔗 GitHub: https://github.com/mykolademyanov

Editorial note

This documentation is AI-assisted, with human editorial responsibility for accuracy, clarity, and production relevance.

Content is grounded in real-world failures, post-mortems, and operational incidents in deployed AI agent systems.