Idea in 30 Seconds
Memory Layer is not just a fact store but a controlled layer for selecting, writing, and returning memory.
The agent should not send full history in every request. It reads from memory only what truly helps with the next step. Memory Layer should not blindly accumulate everything. Its task is to return little, but precise.
When needed: when the agent works not in one step but in a series of steps or sessions where consistency and personalization matter.
LLM can only see what is included in the current context. Memory Layer decides what from the past should be brought back.
Problem
Without a dedicated memory layer, the agent almost always works "from scratch".
This creates typical problems:
- the agent asks again what it already knows;
- answers become inconsistent across sessions;
- too much irrelevant history gets into context;
- important facts are lost in noise;
- duplicates or conflicting versions of the same fact accumulate in memory;
- agent personalizes answers based on stale or weakly supported data.
As a result, cost, latency, and answer errors increase.
Solution
Add Memory Layer as a separate layer for memory operations: what to store, what to return to context, and what to delete.
It separates "how to remember" logic from "how to think" logic, so the agent runs more stably.
Analogy: like a manager's notes about a client.
A manager does not keep the whole message archive in mind. They pull short relevant notes: what matters for this specific conversation.
Memory Layer does the same by returning only needed memory at the needed moment.
How Memory Layer Works
Memory Layer is a controlled layer between Agent Runtime and memory store that decides what to read, what to write, and what to clean up.
Full flow description: Retrieve β Rank β Inject β Write β Compact
Retrieve
Runtime asks Memory Layer for facts for the current step.
Rank
Layer selects the most relevant records by topic, recency, and importance.
Inject
Selected memory is added to context before calling LLM.
Write
After the agent step, the layer decides whether a new fact should be stored, considering usefulness, stability, sensitivity, source, and TTL.
Compact
Old or duplicated records are compressed, updated, or removed by TTL/limit rules.
This cycle repeats at every step and helps the agent keep consistency across steps and sessions.
In Code It Looks Like This
class MemoryLayer:
def __init__(self, store, max_items_per_user=200):
self.store = store
self.max_items_per_user = max_items_per_user
def retrieve(self, user_id: str, query: str, top_k: int = 4):
# Return only relevant memory, not full history.
items = self.store.search(
user_id=user_id,
query=query,
limit=top_k,
min_score=0.7,
exclude_expired=True,
)
return [item["text"] for item in items]
def write(
self,
user_id: str,
observation: str,
tags: list[str],
source: str = "user",
sensitivity: str = "low",
ttl_days: int = 30,
):
if not self._worth_storing(
observation=observation,
tags=tags,
source=source,
sensitivity=sensitivity,
ttl_days=ttl_days,
):
return
self.store.insert(
user_id=user_id,
text=observation,
tags=tags,
source=source,
sensitivity=sensitivity,
ttl_days=ttl_days,
)
self.store.enforce_limit(user_id=user_id, max_items=self.max_items_per_user)
def _worth_storing(
self,
observation: str,
tags: list[str],
source: str,
sensitivity: str,
ttl_days: int,
) -> bool:
text = observation.strip()
if len(text) < 20:
return False
# Do not store short service phrases.
if text.lower() in {"ok", "thanks", "done", "ready"}:
return False
# Do not write sensitive data into standard memory.
if sensitivity == "high":
return False
# Trust only predefined sources.
if source not in {"user", "tool", "policy"}:
return False
# Memory must be stable and useful, not random noise.
stable_tags = {"preference", "constraint", "profile", "goal"}
if not any(tag in stable_tags for tag in tags):
return False
if ttl_days < 1 or ttl_days > 365:
return False
return True
What It Looks Like During Execution
Request: "Prepare a weekly meal plan for me"
Step 1
Agent Runtime: calls Memory Layer.retrieve(...)
Memory Layer: returns relevant facts -> ["peanut allergy", "vegetarian diet"]
Agent Runtime: adds these facts to Context
Agent Runtime: calls LLM.decide(...)
Step 2
LLM: returns -> final_answer (plan without peanuts and meat)
Agent Runtime: passes new observation to Memory Layer.write(...)
Memory Layer: stores fact -> "user wants budget up to $80/week"
Memory Layer helps the agent not forget important facts and not overload context with extra details.
When It Fits - and When It Does Not
Memory Layer is useful when the agent must remember facts across steps or sessions. For one-time requests it is often unnecessary.
Fits
| Situation | Why Memory Layer Fits | |
|---|---|---|
| β | Agent works with user across multiple sessions | Memory stores important facts and removes repeated clarifications. |
| β | Response personalization is needed | Layer returns user preferences and constraints before the LLM step. |
| β | Agent runs a long multi-step workflow within one run | Layer keeps intermediate conclusions and important facts without constantly inflating context. |
| β | Context grows fast and has limits | Instead of full history, the agent gets only top-k relevant facts. |
Does Not Fit
| Situation | Why Memory Layer Does Not Fit | |
|---|---|---|
| β | One-shot request without follow-up dialogue | Separate memory layer adds complexity without visible benefit. |
| β | Need facts that change fast: prices, statuses, availability, live data | Here fresh retrieval or tool call is better than relying on memory. |
| β | Product policy forbids storing data across sessions | Long-term memory will violate privacy and compliance requirements. |
In such cases, one model call is often enough:
response = llm(prompt)
Typical Problems and Failures
| Problem | What Happens | How to Prevent |
|---|---|---|
| Stale memory | Agent uses old fact and gives wrong answer | TTL, record versions, and periodic refresh |
| Incorrect personalization | Agent overconfidently personalizes response based on weak or stale memory | Fact freshness checks, confidence threshold, and user confirmation before personalization |
| Noise in memory | Context receives many low-utility records | Write rules, ranking, and top_k limits |
| Cross-user leak | Agent reads memory of another user or tenant | Isolation by user_id/tenant_id and access checks |
| Poisoned memory | Dangerous or false instruction enters memory | Sanitization, trusted sources, manual review of critical records |
| Context limit overflow | Memory volume in context becomes too large for LLM | Compression, deduplication, and short summaries instead of raw history |
Most Memory Layer issues are solved via clear write rules, strong ranking, and access control.
How It Combines with Other Patterns
Memory Layer does not control the whole agent. It is responsible only for high-quality memory operations at each step.
- Agent Runtime β Runtime decides when to access memory, and Memory Layer decides what exactly to read, write, or delete.
- Tool Execution Layer β tool calls can read or update memory through the controlled execution layer.
- Memory-Augmented Agent β this pattern directly relies on Memory Layer.
- RAG Agent β RAG retrieves external knowledge, while Memory Layer keeps internal experience of specific agent/user.
In other words:
- Agent Runtime defines when the agent accesses memory
- Memory Layer defines what exactly is stored and returned back
How This Differs from Agent Runtime
| Agent Runtime | Memory Layer | |
|---|---|---|
| What it controls | Whole agent loop | Memory writing, retrieval, and quality |
| When it works | At each execution-loop step | During memory read/write |
| What it returns | Next state or final answer | Relevant facts for context |
| Main risk | Wrong loop and limit control | Stale, noisy, or unsafe records |
Agent Runtime is the "conductor" of the whole process.
Memory Layer is the "system memory" that keeps response consistency.
In Short
Memory Layer:
- stores important facts across steps and sessions
- returns only relevant records to context
- compresses or deletes stale memory by rules
- protects data through access isolation and write rules
FAQ
Q: Is it enough to pass full history into prompt?
A: For short scenarios, sometimes yes. But in long dialogues it is expensive, slow, and noisy. Memory Layer gives short relevant memory instead of full log.
Q: How is short-term memory different from long-term memory?
A: Short-term is needed for current run or session. Long-term stores important facts across sessions and is reused later.
Q: Can we write every agent step into memory?
A: Technically yes, but it is poor practice. Better store only useful facts by write rules; otherwise memory quickly turns into noise.
Q: Does Memory Layer replace RAG or tool calls?
A: No. Memory Layer stores internal facts and agent/user experience. For fresh external data, retrieval or tool calls are usually required.
What Next
Memory is useful only when it stays controlled. Next, see where memory connects to execution and policy:
- Agent Runtime - how runtime pulls memory into each iteration.
- Multi-Tenant - how to avoid mixing context and data between customers.
- Policy Boundaries - how to restrict access to sensitive memory.
- Production Stack - how to combine memory quality, audit, and operational control.