RETURN_TO_BLOG
AI & Automation 15 min

AI Agent Memory — How to Make Your Chatbot and Agent Remember Users Across Sessions

AI agent memory is the mechanism that lets a language model retain information beyond a single conversation — because the LLM itself is stateless and forgets everything once the session ends. In practice you build it at two levels: short-term memory is managing the context window within one conversation (through summaries and compression), and long-term memory is storing facts and events in an external database (usually a vector store) and retrieving them in later sessions using the RAG pattern. If your agent needs to remember a customer's preferences, the context of previous tickets, or conclusions from earlier analyses — you need long-term memory, not a bigger context window. Ready-made frameworks (Mem0, Zep, Letta, LangMem) give you this without building from scratch.

The complete guide to AI agent memory: why LLMs are stateless, how working, episodic, semantic and procedural memory differ, how to manage the context window through compression and summarization, how the Mem0, Zep, Letta and LangMem frameworks work, how to implement memory in code and n8n, and how to reconcile it with GDPR.

A customer writes to your chatbot: "the same problem as last time again." The agent replies: "Could you describe the problem?" The customer already knows they are talking to a machine with no memory — and trust evaporates. Meanwhile a human support agent would open the history and say: "I see you reported this two weeks ago, let's check whether the fix worked."

That difference — remembering context between conversations — separates a demo toy from an agent you can deploy in a company. And because a language model inherently remembers nothing between calls, memory has to be designed and built separately. This article shows how: from memory types, through context management and frameworks, to code, n8n and GDPR.

Why an LLM does not remember — the statelessness problem

Every call to a language model is independent. The model has no "state" between requests — it takes text in, returns text out and immediately forgets. The impression that ChatGPT "remembers" a conversation is an illusion: each time, the application sends the entire conversation history so far to the model as part of the prompt. It is not the model remembering — it is the application re-attaching the history.

This mechanism works up to a point, then hits three hard walls:

  • Context window limit — even models with 200k–1M token windows have an upper bound; a long conversation or large documents eventually will not fit
  • Cost and latency — you pay for every input token on every call; re-attaching the full history to each request means cost grows quadratically with conversation length
  • No continuity across sessions — when the user closes the chat and returns tomorrow, the history is empty; without external storage the agent starts from zero

AI agent memory solves all three: instead of re-attaching everything, it stores information externally and retrieves only what is relevant to the current decision. This is exactly the same idea as RAG for company knowledge (I covered it in the article on building a knowledge base) — just applied to interaction history and facts about the user.

The four types of AI agent memory

/// AI AGENT MEMORY ARCHITECTURE

4 types of agent memory — each lives elsewhere

01SHORT-TERM
Working memory
Current sessionContext window / Redis
What the agent "sees" now: conversation history, tool results, loaded files
02LONG-TERM
Episodic memory
Across sessionsEvent log / database
What the agent did and when: conversation traces, decisions, audit trails
03LONG-TERM
Semantic memory
PersistentVector database (RAG)
Facts about the world and user: preferences, domain knowledge, company data
04LONG-TERM
Procedural memory
PersistentPrompts / skills / code
How to perform a task: learned procedures, reusable skills, routines
2
HORIZONS SHORT- AND LONG-TERM
80%
FEWER TOKENS VIA MEMORY COMPRESSION
RAG
RETRIEVAL PATTERN FOR SEMANTIC MEMORY

Research on agent architecture (including "Cognitive Architectures for Language Agents") distinguishes four memory types borrowed from cognitive psychology. Understanding the differences is crucial, because each type needs a different store and a different retrieval strategy — and the most common mistake is dumping everything into one vector database.

  • Working memory — the active context window: the current conversation, loaded files, tool results from this session. You manage it like a token budget, not a search problem — through compression and prioritization, not similarity search
  • Episodic memory — a record of what the agent did and when: conversation traces, decisions made, action trails. Used for auditing, debugging and learning from history. The key is chronological storage, not similarity search
  • Semantic memory — facts about the world, the user and the domain: customer preferences, industry knowledge, company data. RAG was built for this — content-similarity retrieval is the right approach here
  • Procedural memory — how to perform a task: learned procedures, reusable skills, routines. In practice stored in system prompts, tool definitions and the agent's code

The most important design rule: do not mix episodic memory (event logs) with semantic memory (facts) in one vector index. Similarity search over event logs degrades retrieval quality for both — a log "user clicked X at 14:32" and a fact "user prefers email contact" require completely different access strategies.

Short-term memory — managing the context window

Short-term memory is managing what fits in the context window during one session. As a conversation or an agent's actions grow longer, tool observations can consume 70–80% of the token budget — and they need intelligent reduction. Here are the main techniques:

TechniqueHow it worksWhen to use
Sliding windowKeep the last N turns in full, discard older onesSimple chats where only fresh context matters
Rolling summaryLast N turns in full + a concise summary of everything olderLong conversations where early context still matters
CompactionAt a token threshold an LLM compresses history, preserving decisionsMulti-step agents with many tool calls
Prompt compressionToken-level pruning (e.g. LLMLingua) removes low-information tokensWhen you need maximum reduction while keeping content
Tool result limitingCap tool response length before it enters the contextTools returning large JSON blobs or whole documents

In production, two approaches are separated. "Prevention" agents structurally bound context growth — they limit message scope and trim tool results immediately. "Cure" agents let context grow and compress only past a token threshold, triggering LLM-based summarization. For most business use cases a rolling summary is enough: keep the last 8–10 exchanges in full and maintain everything older as a living, updated summary.

Watch the trap: compression is lossy. Every summary loses detail, and if the agent summarizes summaries, after a few iterations it is left with a vague caricature of the conversation. That is why you should extract important facts (order number, agreements, decisions) into long-term memory as structured data before they go under the compression knife.

Long-term memory — how an agent remembers across sessions

Long-term memory lives outside the context window — in an external database, usually a vector store — and survives closing the chat, restarting the server or the user returning a week later. Its lifecycle has three phases:

  1. 1.Write — during or after a conversation the agent extracts important facts and stores them. Key: you do not store the raw transcript, but extracted, structured information ("customer X prefers courier delivery", "project Y has a June 30 deadline")
  2. 2.Retrieve — before the agent responds, it searches semantic memory for facts relevant to the current query and injects them into the context window. This is the core RAG pattern: keep knowledge outside, pull in only what is needed
  3. 3.Update — when new information contradicts old information, memory must be updated, not just appended. Otherwise you accumulate conflicting facts ("prefers email" and "prefers phone") and the agent will guess

The last phase is the hardest and most often skipped. Good memory frameworks do deduplication and conflict resolution automatically — they detect that a new fact replaces an old one and overwrite it instead of multiplying versions. That is why it is worth reaching for a ready-made framework instead of building memory on a raw vector database: writing and retrieving you can code in an hour, but managing the lifecycle of facts is months of refinement.

Memory frameworks: Mem0, Zep, Letta, LangMem

/// MEM0 vs ZEP vs LETTA vs LANGMEM — WHICH MEMORY FRAMEWORK?

Mem0
MOST POPULAR
ApproachVectors + graph
GitHub⭐ 48k+
HostingSelf-host / SaaS
Token reductionup to 80%
Entry barrierLow
Best forFast start, flexibility
Zep
TOP ACCURACY
ApproachTemporal graph (Graphiti)
Benchmark63.8% LongMemEval
HostingSelf-host / SaaS
StrengthRelationships over time
Entry barrierMedium
Best forEntities & relations, CRM
Letta
AGENT RUNTIME
ApproachSelf-editing memory
HeritageMemGPT
HostingSelf-host / cloud
StrengthStateful, full control
Entry barrierHigh
Best forLong-running agent
LangMem
NATIVE LANGGRAPH
ApproachLangChain memory SDK
IntegrationLangGraph / LangChain
HostingSelf-host
StrengthFits orchestration
Entry barrierLow in LC stack
Best forTeams on LangGraph
4
PRODUCTION-READY FRAMEWORKS
63.8%
TOP LONGMEMEVAL SCORE (ZEP)
0
MEMORY CODE WITH MANAGED SaaS

By 2026 the agent memory market has matured enough that you do not have to build it yourself. Four frameworks dominate, each with a different architectural approach:

FrameworkArchitectureStrengthBest for
Mem0Vectors + optional graphFastest start, ~48k stars, up to 80% token reductionMost companies — the default choice
Zep (Graphiti)Temporal knowledge graphTop accuracy (63.8% LongMemEval), relationships over timeApps with entities and relations (CRM, contact networks)
Letta (MemGPT)Self-editing memory in an agent runtimeFull control over a long-running agent's stateComplex, autonomous agents
LangMemMemory SDK for LangChainNative LangGraph integrationTeams already building on LangChain

The decision rule in practice:

  • Start with Mem0 unless you have a strong reason to choose otherwise — lowest entry barrier, the managed SaaS removes graph-database and scaling overhead, and self-hosting is available for privacy requirements
  • Choose Zep when understanding relationships and how they change over time is key — who works with whom, how preferences evolve, links between entities; here a temporal graph beats a plain vector store
  • Consider Letta when you are building an autonomous agent running for hours or days and need memory as a first-class runtime element, not a library bolted on the side
  • Use LangMem if your stack is already LangGraph — you get memory tooling that fits your existing orchestration without adding a new dependency

There is no single winner — the choice depends on whether you value deployment speed (Mem0), accuracy and relationships (Zep), control (Letta) or stack consistency (LangMem).

How to implement memory in code — a Mem0 example

The simplest route is Mem0. Below is a customer support agent that writes and retrieves memory per user — just a few lines beyond a normal model call:

agent_with_memory.py
from mem0 import Memoryfrom openai import OpenAImemory = Memory()client = OpenAI()def chat(user_id: str, message: str) -> str:    # 1. Retrieve facts relevant to the current query    relevant = memory.search(query=message, user_id=user_id, limit=5)    context = "\n".join(m["memory"] for m in relevant["results"])    # 2. Inject memory into the system prompt    system = (        "You are a customer support assistant. "        "Known facts about the user:\n" + (context or "(none)")    )    resp = client.chat.completions.create(        model="gpt-4o-mini",        messages=[            {"role": "system", "content": system},            {"role": "user", "content": message},        ],    )    answer = resp.choices[0].message.content    # 3. Store new facts from this exchange (extraction happens automatically)    memory.add(        messages=[            {"role": "user", "content": message},            {"role": "assistant", "content": answer},        ],        user_id=user_id,    )    return answerThree things that make a difference here:- **user_id isolates memory** — each customer has their own space; Mem0 never mixes two users' facts, which is critical for privacy- **memory.add() does not store the raw conversation** — under the hood an LLM extracts facts worth remembering and discards small talk; you do not clutter the database with "hello" and "thanks"- **search() retrieves only the top 5** — you inject a few of the most relevant facts into context, not the whole history; cost and latency stay constant no matter how long the customer has been with youFor self-hosting (article #40) you swap OpenAI for a local Ollama or vLLM endpoint, and configure Mem0 to use a local model for extraction and a local vector store — all memory stays in your infrastructure.

Memory in n8n and no-code tools

Not every deployment needs code. In n8n the AI Agent node has built-in memory options you configure by clicking:

  • Simple Memory (window buffer) — keeps the last N messages in instance memory; short-term memory within one workflow, the simplest but volatile (gone after a restart)
  • Memory in an external database — connect Postgres, Redis or a vector store as the store; memory survives restarts and works across sessions
  • Mem0 / Zep via an HTTP node or a dedicated integration — full long-term memory with fact extraction, without writing code

The key in n8n is the session key — the identifier the agent uses to know whose memory to load. Usually it is a CRM customer ID, a phone number or an email address. Without a stable session key every conversation is anonymous and memory does not work across sessions. This is exactly the same mechanism as user_id in code — just set in the interface.

For simple cases (a FAQ assistant remembering context within one conversation) Simple Memory is enough. For an agent that should recognize a returning customer and their history — you need memory in an external database with a sensible session key.

Memory security and privacy (GDPR)

Agent memory is by definition a store of personal data — preferences, contact history, sometimes sensitive data. That places it squarely under GDPR and requires designing compliance from the start, not after the fact:

  • Right to be forgotten (Art. 17) — you must be able to delete a specific user's entire memory on request; this is why isolation by user_id is not just good practice but a requirement — without it you cannot erase one person's data without touching others'
  • Data minimization (Art. 5) — store only facts needed for the agent to function; fact extraction (instead of raw transcripts) helps, but configure it so it does not capture sensitive data without a legal basis
  • Tenant isolation (multi-tenancy) — in an app serving multiple companies, one client's memory must never leak into another's context; test this deliberately, because an isolation bug is a data breach
  • Encryption and data location — for sensitive data consider self-hosting memory (a local vector store + a local extraction model) so data never leaves your infrastructure — this is an argument for self-hosted Mem0/Zep over managed SaaS
  • Prompt injection via memory — if the agent writes user-supplied content to memory, an attacker can inject an instruction that executes on the next retrieval; memory is another vector from the prompt injection article (#39) — treat stored content as untrusted data

The practical takeaway: before deploying memory to production, answer the question "how do I delete one user's data on request?" If you do not have a simple answer, the memory architecture is not GDPR-ready.

AI agent memory deployment checklist

  1. 1.Separate memory types: working (context), episodic (logs), semantic (facts) — do not dump everything into one vector database
  2. 2.For short-term memory start with a rolling summary: last 8–10 exchanges in full + a living summary of older ones
  3. 3.Extract important facts into long-term memory as structured data before compression loses them
  4. 4.Choose a framework: Mem0 as the default, Zep for relationships over time, Letta for autonomous agents, LangMem for a LangGraph stack
  5. 5.Isolate memory by user_id / session key from day one — it is the foundation of privacy and GDPR
  6. 6.Configure fact extraction instead of storing raw transcripts — smaller database, lower cost, less personal data
  7. 7.Handle update and deduplication: a new fact should overwrite a conflicting old one, not pile up beside it
  8. 8.Retrieve only the top-K relevant facts (3–5) — cost and latency stay constant regardless of history
  9. 9.For sensitive data consider self-hosting memory (local store + local extraction model)
  10. 10.Treat stored user content as untrusted data — memory is a prompt injection vector
  11. 11.Implement per-user memory deletion (right to be forgotten) before production
  12. 12.Measure quality: does the agent retrieve the right facts? Test on realistic returning-user scenarios

Key takeaways

An LLM is stateless — memory must be built separately, at two levels. Short-term is managing the context window through summaries and compression; long-term is writing facts to an external store and retrieving them with the RAG pattern. Separate the four memory types (working, episodic, semantic, procedural) and do not dump everything into one vector database. Do not build from scratch — Mem0 is the default choice, Zep for relationships over time, Letta for autonomous agents, LangMem for LangGraph. From day one, isolate memory by user_id, extract facts instead of transcripts and plan per-user deletion — because agent memory is a store of personal data under the full GDPR regime and another prompt injection vector.

---

I help companies design and deploy memory for AI agents and chatbots — from choosing the architecture and framework (Mem0, Zep, Letta), through integration with code or n8n, to GDPR compliance and security. Get in touch — I start with a free 30-minute analysis of your use case.

/// AUTHOR
Paweł Wiszniewski – AI & Web Engineer

Paweł Wiszniewski

SEO & GEO Specialist & AI Engineer

SEO/GEO specialist (10 years) and AI engineer (3 years). I build search visibility, AI systems and automations that reduce costs and improve operational efficiency.

Signal received?

Terminate
Silence

Initiate protocol. Establish connection. Let's build something loud.

> WAITING_FOR_INPUT...