How is AI agent memory different from RAG?

They are close but distinct concepts. RAG (Retrieval-Augmented Generation) is the pattern of retrieving knowledge from an external store and injecting it into the prompt — classically for company documents. Agent memory uses the same retrieval mechanism but for different data: interaction history and facts about the user that the agent stores itself while operating. Put differently: RAG usually reads from a knowledge base someone prepared; agent memory writes and updates its own store based on conversations. An agent's semantic memory is technically implemented as RAG — the difference is where the data comes from and whether the agent creates it or only reads it.

Isn't a bigger context window (1M tokens) enough?

No, for three reasons. First, cost: you pay for every input token on every call, so re-attaching the full history to each query is expensive and grows with conversation length. Second, latency: a bigger context means a slower response. Third and most important — the context window does not survive a session closing; when the user returns tomorrow the window is empty no matter how large it was. A big window helps short-term memory (one long session) but does not replace long-term memory across sessions. On top of that, models use information from the middle of a very long context less well (the "lost in the middle" problem) — selectively retrieving a few relevant facts works better than dumping everything at once.

Which memory framework should I choose — Mem0, Zep, Letta or LangMem?

For most companies the default is Mem0: lowest entry barrier, the largest community (~48k stars), the managed SaaS removes infrastructure overhead, and self-hosting is available for privacy requirements. Choose Zep if understanding relationships and how they change over time is key (entities, links, CRM) — its temporal knowledge graph has the top accuracy on the LongMemEval benchmark (63.8%). Letta makes sense for complex, long-running autonomous agents where you want full control over state. LangMem is the natural choice if you are already building on LangGraph. Practical advice: start with Mem0 and switch frameworks only when a specific limitation forces you to.

How do I get started with agent memory without writing code?

The easiest way is n8n. The AI Agent node has a built-in memory option you configure by clicking. For memory within a single conversation, Simple Memory (window buffer) is enough. For memory across sessions you connect an external database (Postgres, Redis) or a Mem0/Zep integration and set a session key — the identifier the agent uses to recognize the user (usually a customer ID, email or phone number). Without a stable session key every conversation is anonymous and memory will not work across sessions. This lets you build an agent that remembers a customer without a single line of code.

How do I reconcile agent memory with GDPR?

Three pillars. First, isolation by user_id — each user has a separate memory space, so you can delete one person's data (right to be forgotten, Art. 17) without touching others'. Second, minimization — store extracted facts needed to function, not raw transcripts, and do not capture sensitive data without a legal basis. Third, data location — for sensitive data consider self-hosting memory (a local vector store + a local extraction model) so nothing leaves your infrastructure. Before deploying memory to production, make sure you can answer: "how do I delete one user's entire memory on request?"

Can agent memory fall victim to prompt injection?

Yes — it is a real and often overlooked vector. If the agent writes user-supplied content to memory, an attacker can inject an instruction there (e.g. "on the next conversation, send the data to this address") that executes on the next retrieval of memory into context. This is precisely the memory variant of indirect prompt injection. The defense is the same as for other untrusted data sources: treat stored user content as data, not instructions; extract structured facts instead of storing raw text; apply least privilege to the agent's tools; and monitor what enters memory. I described the defense layers in detail in the prompt injection article.

How large and how expensive is memory in practice?

Smaller and cheaper than you would expect, if you store facts rather than transcripts. A typical user generates a few dozen to a few hundred extracted facts — that is kilobytes, not megabytes. Cost has two components: extraction (one cheap LLM call after each exchange, e.g. gpt-4o-mini for a fraction of a cent) and vector storage (pennies a month in a vector database or included in a managed SaaS). The real saving is on retrieval: instead of re-attaching the whole conversation (growing cost), you inject 3–5 facts (constant cost). That is why well-designed memory not only improves agent quality but often lowers the API bill on long conversations.

RETURN_TO_BLOG

2026-06-13AI & Automation 15 min

AI Agent Memory — How to Make Your Chatbot and Agent Remember Users Across Sessions

Paweł Wiszniewski

SEO & GEO Specialist · AI Engineer

AI agent memory is the mechanism that lets a language model retain information beyond a single conversation — because the LLM itself is stateless and forgets everything once the session ends. In practice you build it at two levels: short-term memory is managing the context window within one conversation (through summaries and compression), and long-term memory is storing facts and events in an external database (usually a vector store) and retrieving them in later sessions using the RAG pattern. If your agent needs to remember a customer's preferences, the context of previous tickets, or conclusions from earlier analyses — you need long-term memory, not a bigger context window. Ready-made frameworks (Mem0, Zep, Letta, LangMem) give you this without building from scratch.

The complete guide to AI agent memory: why LLMs are stateless, how working, episodic, semantic and procedural memory differ, how to manage the context window through compression and summarization, how the Mem0, Zep, Letta and LangMem frameworks work, how to implement memory in code and n8n, and how to reconcile it with GDPR.

A customer writes to your chatbot: "the same problem as last time again." The agent replies: "Could you describe the problem?" The customer already knows they are talking to a machine with no memory — and trust evaporates. Meanwhile a human support agent would open the history and say: "I see you reported this two weeks ago, let's check whether the fix worked."

That difference — remembering context between conversations — separates a demo toy from an agent you can deploy in a company. And because a language model inherently remembers nothing between calls, memory has to be designed and built separately. This article shows how: from memory types, through context management and frameworks, to code, n8n and GDPR.

Why an LLM does not remember — the statelessness problem

Every call to a language model is independent. The model has no "state" between requests — it takes text in, returns text out and immediately forgets. The impression that ChatGPT "remembers" a conversation is an illusion: each time, the application sends the entire conversation history so far to the model as part of the prompt. It is not the model remembering — it is the application re-attaching the history.

This mechanism works up to a point, then hits three hard walls:

Context window limit — even models with 200k–1M token windows have an upper bound; a long conversation or large documents eventually will not fit
Cost and latency — you pay for every input token on every call; re-attaching the full history to each request means cost grows quadratically with conversation length
No continuity across sessions — when the user closes the chat and returns tomorrow, the history is empty; without external storage the agent starts from zero

AI agent memory solves all three: instead of re-attaching everything, it stores information externally and retrieves only what is relevant to the current decision. This is exactly the same idea as RAG for company knowledge (I covered it in the article on building a knowledge base) — just applied to interaction history and facts about the user.

The four types of AI agent memory

/// AI AGENT MEMORY ARCHITECTURE

4 types of agent memory — each lives elsewhere

01SHORT-TERM

Working memory

Current sessionContext window / Redis

What the agent "sees" now: conversation history, tool results, loaded files

02LONG-TERM

Episodic memory

Across sessionsEvent log / database

What the agent did and when: conversation traces, decisions, audit trails

03LONG-TERM

Semantic memory

PersistentVector database (RAG)

Facts about the world and user: preferences, domain knowledge, company data

04LONG-TERM

Procedural memory

PersistentPrompts / skills / code

How to perform a task: learned procedures, reusable skills, routines

HORIZONS SHORT- AND LONG-TERM

80%

FEWER TOKENS VIA MEMORY COMPRESSION

RAG

RETRIEVAL PATTERN FOR SEMANTIC MEMORY

Research on agent architecture (including "Cognitive Architectures for Language Agents") distinguishes four memory types borrowed from cognitive psychology. Understanding the differences is crucial, because each type needs a different store and a different retrieval strategy — and the most common mistake is dumping everything into one vector database.

Working memory — the active context window: the current conversation, loaded files, tool results from this session. You manage it like a token budget, not a search problem — through compression and prioritization, not similarity search
Episodic memory — a record of what the agent did and when: conversation traces, decisions made, action trails. Used for auditing, debugging and learning from history. The key is chronological storage, not similarity search
Semantic memory — facts about the world, the user and the domain: customer preferences, industry knowledge, company data. RAG was built for this — content-similarity retrieval is the right approach here
Procedural memory — how to perform a task: learned procedures, reusable skills, routines. In practice stored in system prompts, tool definitions and the agent's code

The most important design rule: do not mix episodic memory (event logs) with semantic memory (facts) in one vector index. Similarity search over event logs degrades retrieval quality for both — a log "user clicked X at 14:32" and a fact "user prefers email contact" require completely different access strategies.

Short-term memory — managing the context window

Short-term memory is managing what fits in the context window during one session. As a conversation or an agent's actions grow longer, tool observations can consume 70–80% of the token budget — and they need intelligent reduction. Here are the main techniques:

Technique	How it works	When to use
Sliding window	Keep the last N turns in full, discard older ones	Simple chats where only fresh context matters
Rolling summary	Last N turns in full + a concise summary of everything older	Long conversations where early context still matters
Compaction	At a token threshold an LLM compresses history, preserving decisions	Multi-step agents with many tool calls
Prompt compression	Token-level pruning (e.g. LLMLingua) removes low-information tokens	When you need maximum reduction while keeping content
Tool result limiting	Cap tool response length before it enters the context	Tools returning large JSON blobs or whole documents

In production, two approaches are separated. "Prevention" agents structurally bound context growth — they limit message scope and trim tool results immediately. "Cure" agents let context grow and compress only past a token threshold, triggering LLM-based summarization. For most business use cases a rolling summary is enough: keep the last 8–10 exchanges in full and maintain everything older as a living, updated summary.

Watch the trap: compression is lossy. Every summary loses detail, and if the agent summarizes summaries, after a few iterations it is left with a vague caricature of the conversation. That is why you should extract important facts (order number, agreements, decisions) into long-term memory as structured data before they go under the compression knife.

Long-term memory — how an agent remembers across sessions

Long-term memory lives outside the context window — in an external database, usually a vector store — and survives closing the chat, restarting the server or the user returning a week later. Its lifecycle has three phases:

1.Write — during or after a conversation the agent extracts important facts and stores them. Key: you do not store the raw transcript, but extracted, structured information ("customer X prefers courier delivery", "project Y has a June 30 deadline")
2.Retrieve — before the agent responds, it searches semantic memory for facts relevant to the current query and injects them into the context window. This is the core RAG pattern: keep knowledge outside, pull in only what is needed
3.Update — when new information contradicts old information, memory must be updated, not just appended. Otherwise you accumulate conflicting facts ("prefers email" and "prefers phone") and the agent will guess

The last phase is the hardest and most often skipped. Good memory frameworks do deduplication and conflict resolution automatically — they detect that a new fact replaces an old one and overwrite it instead of multiplying versions. That is why it is worth reaching for a ready-made framework instead of building memory on a raw vector database: writing and retrieving you can code in an hour, but managing the lifecycle of facts is months of refinement.

Memory frameworks: Mem0, Zep, Letta, LangMem

/// MEM0 vs ZEP vs LETTA vs LANGMEM — WHICH MEMORY FRAMEWORK?

Mem0

Framework	Architecture	Strength	Best for
Mem0	Vectors + optional graph	Fastest start, ~48k stars, up to 80% token reduction	Most companies — the default choice
Zep (Graphiti)	Temporal knowledge graph	Top accuracy (63.8% LongMemEval), relationships over time	Apps with entities and relations (CRM, contact networks)
Letta (MemGPT)	Self-editing memory in an agent runtime	Full control over a long-running agent's state	Complex, autonomous agents
LangMem	Memory SDK for LangChain	Native LangGraph integration	Teams already building on LangChain

How to implement memory in code — a Mem0 example

The simplest route is Mem0. Below is a customer support agent that writes and retrieves memory per user — just a few lines beyond a normal model call:

agent_with_memory.py

from mem0 import Memoryfrom openai import OpenAImemory = Memory()client = OpenAI()def chat(user_id: str, message: str) -> str:    # 1. Retrieve facts relevant to the current query    relevant = memory.search(query=message, user_id=user_id, limit=5)    context = "\n".join(m["memory"] for m in relevant["results"])    # 2. Inject memory into the system prompt    system = (        "You are a customer support assistant. "        "Known facts about the user:\n" + (context or "(none)")    )    resp = client.chat.completions.create(        model="gpt-4o-mini",        messages=[            {"role": "system", "content": system},            {"role": "user", "content": message},        ],    )    answer = resp.choices[0].message.content    # 3. Store new facts from this exchange (extraction happens automatically)    memory.add(        messages=[            {"role": "user", "content": message},            {"role": "assistant", "content": answer},        ],        user_id=user_id,    )    return answerThree things that make a difference here:- **user_id isolates memory** — each customer has their own space; Mem0 never mixes two users' facts, which is critical for privacy- **memory.add() does not store the raw conversation** — under the hood an LLM extracts facts worth remembering and discards small talk; you do not clutter the database with "hello" and "thanks"- **search() retrieves only the top 5** — you inject a few of the most relevant facts into context, not the whole history; cost and latency stay constant no matter how long the customer has been with youFor self-hosting (article #40) you swap OpenAI for a local Ollama or vLLM endpoint, and configure Mem0 to use a local model for extraction and a local vector store — all memory stays in your infrastructure.

Memory in n8n and no-code tools

Not every deployment needs code. In n8n the AI Agent node has built-in memory options you configure by clicking:

Simple Memory (window buffer) — keeps the last N messages in instance memory; short-term memory within one workflow, the simplest but volatile (gone after a restart)
Memory in an external database — connect Postgres, Redis or a vector store as the store; memory survives restarts and works across sessions
Mem0 / Zep via an HTTP node or a dedicated integration — full long-term memory with fact extraction, without writing code

The key in n8n is the session key — the identifier the agent uses to know whose memory to load. Usually it is a CRM customer ID, a phone number or an email address. Without a stable session key every conversation is anonymous and memory does not work across sessions. This is exactly the same mechanism as user_id in code — just set in the interface.

For simple cases (a FAQ assistant remembering context within one conversation) Simple Memory is enough. For an agent that should recognize a returning customer and their history — you need memory in an external database with a sensible session key.

Agent memory is by definition a store of personal data — preferences, contact history, sometimes sensitive data. That places it squarely under GDPR and requires designing compliance from the start, not after the fact:

Right to be forgotten (Art. 17) — you must be able to delete a specific user's entire memory on request; this is why isolation by user_id is not just good practice but a requirement — without it you cannot erase one person's data without touching others'
Data minimization (Art. 5) — store only facts needed for the agent to function; fact extraction (instead of raw transcripts) helps, but configure it so it does not capture sensitive data without a legal basis
Tenant isolation (multi-tenancy) — in an app serving multiple companies, one client's memory must never leak into another's context; test this deliberately, because an isolation bug is a data breach
Encryption and data location — for sensitive data consider self-hosting memory (a local vector store + a local extraction model) so data never leaves your infrastructure — this is an argument for self-hosted Mem0/Zep over managed SaaS
Prompt injection via memory — if the agent writes user-supplied content to memory, an attacker can inject an instruction that executes on the next retrieval; memory is another vector from the prompt injection article (#39) — treat stored content as untrusted data

The practical takeaway: before deploying memory to production, answer the question "how do I delete one user's data on request?" If you do not have a simple answer, the memory architecture is not GDPR-ready.

AI agent memory deployment checklist

1.Separate memory types: working (context), episodic (logs), semantic (facts) — do not dump everything into one vector database
2.For short-term memory start with a rolling summary: last 8–10 exchanges in full + a living summary of older ones
3.Extract important facts into long-term memory as structured data before compression loses them
4.Choose a framework: Mem0 as the default, Zep for relationships over time, Letta for autonomous agents, LangMem for a LangGraph stack
5.Isolate memory by user_id / session key from day one — it is the foundation of privacy and GDPR
6.Configure fact extraction instead of storing raw transcripts — smaller database, lower cost, less personal data
7.Handle update and deduplication: a new fact should overwrite a conflicting old one, not pile up beside it
8.Retrieve only the top-K relevant facts (3–5) — cost and latency stay constant regardless of history
9.For sensitive data consider self-hosting memory (local store + local extraction model)
10.Treat stored user content as untrusted data — memory is a prompt injection vector
11.Implement per-user memory deletion (right to be forgotten) before production
12.Measure quality: does the agent retrieve the right facts? Test on realistic returning-user scenarios

Key takeaways

An LLM is stateless — memory must be built separately, at two levels. Short-term is managing the context window through summaries and compression; long-term is writing facts to an external store and retrieving them with the RAG pattern. Separate the four memory types (working, episodic, semantic, procedural) and do not dump everything into one vector database. Do not build from scratch — Mem0 is the default choice, Zep for relationships over time, Letta for autonomous agents, LangMem for LangGraph. From day one, isolate memory by user_id, extract facts instead of transcripts and plan per-user deletion — because agent memory is a store of personal data under the full GDPR regime and another prompt injection vector.

---

I help companies design and deploy memory for AI agents and chatbots — from choosing the architecture and framework (Mem0, Zep, Letta), through integration with code or n8n, to GDPR compliance and security. Get in touch — I start with a free 30-minute analysis of your use case.

/// RELATED_SERVICES

Need these concepts implemented? Explore the services related to this topic.

Service

AI App Development

Custom AI software and AI-powered web applications. MVP development, full stack engineering, and AI systems programming from scratch to production.

View service

/// SOURCES

/// RELATED_RECORDS

AI & Automation

Vibe Coding: Complete Guide to AI Coding Tools 2026

Claude Code, Cursor, GitHub Copilot, Codex CLI, Gemini CLI, Lovable, Bolt.new — 60% of all new code worldwide is AI-generated (Gartner, 2026). A complete map of 11 vibe coding tools across 3 categories, with pricing, use cases, and a selection guide for businesses.

18 min

AI & Automation

AI Deep Research — How an Agent Searches the Web and Writes the Report Instead of Your Analyst

OpenAI Deep Research, Perplexity, and web-browsing agents are reshaping desk research: a report that takes an analyst 4–8 hours, an agent finishes in 5–20 minutes with source citations. I explain how these tools work, when they genuinely replace a human and when they don't, what ROI looks like, how to build your own research-automation pipeline, and when it makes sense to let the agent do it instead of an employee.

15 min

AI & Automation

AI in Recruitment and HR 2026 — CV Screening Automation, EU AI Act Obligations, and When AI Helps vs Hurts

AI cuts CV screening time by 75%, but recruitment systems are classified as high-risk AI under the EU AI Act — with a full compliance package: human oversight, transparency, technical documentation, EU database registration. I explain what AI in HR can safely do (screening as a filter, chatbot, onboarding), where the line is (autonomous decisions without a human), which tools work for SMEs, and how to avoid legal exposure.

17 min

/// AUTHOR

Paweł Wiszniewski

SEO & GEO Specialist & AI Engineer

SEO/GEO specialist (10 years) and AI engineer (3 years). I build search visibility, AI systems and automations that reduce costs and improve operational efficiency.

LinkedIn Facebook

Signal received?

Terminate
Silence

Initiate protocol. Establish connection. Let's build something loud.

> WAITING_FOR_INPUT...

BIAŁYSTOK, PL

+48 732 022 086 pawel.wiszniewski95@gmail.com

Why an LLM does not remember — the statelessness problem

The four types of AI agent memory

4 types of agent memory — each lives elsewhere

Short-term memory — managing the context window

Long-term memory — how an agent remembers across sessions

Memory frameworks: Mem0, Zep, Letta, LangMem

How to implement memory in code — a Mem0 example

Memory in n8n and no-code tools

Memory security and privacy (GDPR)

AI agent memory deployment checklist

Key takeaways

/// RELATED_SERVICES

AI App Development

/// SOURCES

/// RELATED_RECORDS

Vibe Coding: Complete Guide to AI Coding Tools 2026

AI Deep Research — How an Agent Searches the Web and Writes the Report Instead of Your Analyst

AI in Recruitment and HR 2026 — CV Screening Automation, EU AI Act Obligations, and When AI Helps vs Hurts

Signal received?

TerminateSilence

Terminate
Silence