Agent Memory: What It Remembers and Why

Without this, every new action would be like a first attempt. Sometimes endlessly.
On this page
  1. What agent memory is and what it consists of
  2. Short-term vs long-term memory
  3. What the agent remembers within a task
  4. What the agent can remember between tasks
  5. Memory limit: context window
  6. In code this looks like
  7. 1) Short-term memory: what the agent "sees" now
  8. 2) Context limit: old items can fall out
  9. 3) Long-term memory: what we store between tasks
  10. 4) In a new task we read this data back
  11. 5) The agent builds a response using memory
  12. Analogy from everyday life
  13. In short
  14. FAQ
  15. What’s next

When an agent executes a task, it does not just react to the current instruction.

It takes into account what already happened before: what you asked, what it already did, and what result it got.


Without this, every new action would be like a first attempt.

It would call the same API again or repeat a step that already failed

β€” sometimes endlessly.


Memory is exactly what lets an agent move forward instead of going in circles.

What agent memory is and what it consists of

AI agent: Agent memory: what it remembers and why

Agent memory is not one place where all information is stored.

It is a set of mechanisms that let it:

  • keep context for the current task
  • and use experience from previous ones

Without them, the agent does not know: what has already been done
what worked
and what to do next

Short-term vs long-term memory

Diagram

Not all agent memory is the same.

There is memory during a task.
And there is memory between tasks.


Short-term exists only during the task.

This is the context of the current conversation:

  • your instructions
  • agent responses
  • results of previous steps

When the task is finished, this context disappears.

Next time, the agent starts with a "clean slate."


Long-term is stored between tasks.

It allows the agent to:

  • remember preferences
  • account for previous experience
  • use data from past tasks

Without it, each new task is like the first one.


A simple example:

You ask the agent: "Make a report like last time."

With short-term memory only, it does not know what "last time" was.
With long-term memory, it knows the format, sources, and structure, and can repeat it.

Short-termLong-term
Works during taskβœ…βŒ
Stored between tasksβŒβœ…
Has limitsβœ…βŒ
Needs storageβŒβœ…

What the agent remembers within a task

When the agent works on a task, it "sees" only the current conversation.

Everything you write.
Everything the agent answers.
Every result it gets from tools.

That is its short-term memory: context.


It uses this to:

  • Understand what is happening now
  • Decide which step to take next
  • And avoid repeating what is already done

But this context is not unlimited.

If the conversation becomes too long, part of older information simply falls out.

The agent no longer sees it.

And it can:

  • Forget the original requirement
  • Lose an important detail
  • Or do an action it already performed earlier

What the agent can remember between tasks

When a task is completed, conversation context disappears.

But that does not mean the agent must forget everything forever.

It can save part of the information in external memory.


This can be:

  • A database
  • A file
  • Or another storage

Where the agent writes:

  • Preferences
  • Previous decisions
  • Or important facts

And on the next task, it can read this data back.

That is how it remembers:

  • How you work
  • Which formats you use
  • Or what it did before

Even if the previous conversation ended long ago.

Memory limit: context window

Agent short-term memory has limits.

It cannot remember the entire conversation in full.

There is a maximum amount of context the model can "see" at once.

This is called the context window.


When the conversation becomes too long, part of old information simply does not fit anymore.

It falls out of context.

The agent stops considering it.


Because of this, it can:

  • Forget the original requirement
  • Lose an important detail
  • Or repeat an action it already performed earlier

In code this looks like

Below is the same principle in a simple format:
there is short-term memory (task context) and long-term memory (external storage between tasks).

1) Short-term memory: what the agent "sees" now

This is current messages and recent step results:

PYTHON
short_memory = [
    {"role": "user", "content": "Prepare a weekly sales report"},
    {"role": "assistant", "content": "Okay, starting data collection"},
    {"role": "tool", "content": "sales_total=12400"},
]

2) Context limit: old items can fall out

If context is limited, the system keeps only recent items:

PYTHON
MAX_ITEMS = 3
short_memory = short_memory[-MAX_ITEMS:]

Because of this, the agent may not see early instructions.

3) Long-term memory: what we store between tasks

Separately, we keep a storage with useful facts:

PYTHON
long_memory_store = {
    "user:anna": {
        "report_format": "short-bullets",
        "currency": "USD",
    }
}

4) In a new task we read this data back

Before responding, the agent retrieves saved preferences:

PYTHON
user_prefs = long_memory_store.get("user:anna", {})

task_context = {
    "request": "Make a report like last time",
    "prefs": user_prefs,
}

5) The agent builds a response using memory

PYTHON
def build_report(context: dict):
    fmt = context["prefs"].get("report_format", "default")
    currency = context["prefs"].get("currency", "USD")
    return f"Report format={fmt}, currency={currency}"


result = build_report(task_context)
# "Report format=short-bullets, currency=USD"

Without long-term memory, this would be format=default.

Full implementation example with connected LLM

PYPython
TSTypeScript Β· soon

Analogy from everyday life

Imagine you are on a phone call, but you can hear only the last 30 seconds of it.

You know what the other person just said.
You remember the latest response.
And you can continue the conversation.


But if they say:

"As I explained at the beginning..."

β€” you did not hear that beginning.

It simply dropped out.


And you might:

  • Ask the same thing again
  • Misunderstand the task
  • Or answer off-topic

If you have notes from previous calls, you can read them and restore context.

This is exactly how the agent uses short-term and long-term memory.

In short

Quick take

The agent has two memory types:

  • Short-term: context of the current task
  • Long-term: saved data between tasks

Short-term memory is limited:
part of information can disappear from context.

Long-term memory allows it to:
store experience and use it later.

FAQ

Q: Does an agent remember previous tasks?
A: Only if that information is saved in long-term memory outside the current conversation.

Q: Why can an agent forget the original instruction?
A: Because of context window limits: part of older information can drop out of short-term memory.

Q: Why does an agent need long-term memory?
A: To store important data between tasks and use it in the future.

What’s next

Now you know what the agent remembers and how this helps it move forward.

But memory is only part of the picture.

Because the agent does not just remember actions.
It executes them.

And not all actions are equally safe.

One thing is to read data.
Another is to change it.
Or delete it.
Or spend money on API calls.

That is why an agent needs not only to know what to do.
It needs to know what it is allowed to do.

⏱️ 7 min read β€’ Updated Mar, 2026Difficulty: β˜…β˜…β˜†
Practical continuation

Pattern implementation examples

Continue with implementation using example projects.

Integrated: production controlOnceOnly
Add guardrails to tool-calling agents
Ship this pattern with governance:
  • Budgets (steps / spend caps)
  • Tool permissions (allowlist / blocklist)
  • Kill switch & incident stop
  • Idempotency & dedupe
  • Audit logs & traceability
Integrated mention: OnceOnly is a control layer for production agent systems.
Author

This documentation is curated and maintained by engineers who ship AI agents in production.

The content is AI-assisted, with human editorial responsibility for accuracy, clarity, and production relevance.

Patterns and recommendations are grounded in post-mortems, failure modes, and operational incidents in deployed systems, including during the development and operation of governance infrastructure for agents at OnceOnly.