LangGraph is a Python library from LangChain that lets you build AI agents as directed graphs with managed state — where each node is a step of logic, and edges (including conditional ones) decide the flow. It is the de facto standard for building production agent workflows in 2026, because it solves the fundamental problem of simple agent loops: no control over state, hard to debug, and no way to resume interrupted tasks. Instead of a while-loop with an LLM inside, you get an explicit, controllable state machine — with the ability to pause, resume, run parallel branches and built-in checkpointing.
Complete guide to LangGraph: how StateGraph works with nodes, edges and conditional routing, the ReAct pattern step by step, human-in-the-loop with interrupt/resume, PostgreSQL persistence with checkpointers, parallel branches with Send API, streaming, and comparison with CrewAI, AutoGen and Pydantic AI.
Imagine an agent that analyses a financial report: first it extracts data, then in parallel checks two databases, then — if the data is contradictory — asks a human for a resolution, and finally generates the result. Doing that in a plain while-loop is a maze of if-else. In LangGraph it is a readable graph with five nodes. This article will walk you through the key concepts, production patterns and a comparison with other frameworks.
What is LangGraph and when to use it
/// LANGGRAPH — STATE GRAPH FLOW (REACT AGENT)
KEY API
LangGraph extends LangChain with a control-flow graph with managed state. Key concepts:
- StateGraph — the main class; you define the state schema (TypedDict or Pydantic) and add nodes
- Node — a Python function that receives the state and returns an update
- Edge — a connection between nodes; can be unconditional or conditional
- Conditional edge — a routing function that chooses the next node based on state
- START / END — special framework nodes; every graph begins at START and ends at END
LangGraph is the right choice when: - The agent needs to act multiple times (loops, iterations, retries) - You need human-in-the-loop (approvals, corrections mid-flight) - The workflow has conditional branches or parallel lanes - The task can be interrupted and resumed (long operations, asynchronous) - You want to debug and visualise the flow step by step
When LangGraph is overkill: for a simple agent with a single loop, plain Python with an LLM and tool calling is enough. LangGraph shines with complex, multi-step flows.
First graph in LangGraph — ReAct agent from scratch
The ReAct pattern (Reasoning + Acting) is the most popular starting point: the agent thinks, decides on an action, executes a tool, observes the result and moves forward — or finishes. In LangGraph this maps naturally to a graph.
from typing import TypedDict, Annotated, Sequencefrom langgraph.graph import StateGraph, START, ENDfrom langgraph.prebuilt import ToolNodefrom langchain_anthropic import ChatAnthropicfrom langchain_core.messages import BaseMessage, HumanMessageimport operatorclass AgentState(TypedDict): messages: Annotated[Sequence[BaseMessage], operator.add]model = ChatAnthropic(model="claude-haiku-4-5-20251001").bind_tools(tools)def agent_node(state: AgentState) -> dict: response = model.invoke(state["messages"]) return {"messages": [response]}def should_continue(state: AgentState) -> str: last = state["messages"][-1] if last.tool_calls: return "tools" return ENDgraph = StateGraph(AgentState)graph.add_node("agent", agent_node)graph.add_node("tools", ToolNode(tools))graph.add_edge(START, "agent")graph.add_conditional_edges("agent", should_continue)graph.add_edge("tools", "agent")app = graph.compile()
This example shows a complete ReAct cycle in 30 lines. The key element is the should_continue function — conditional routing that decides whether to go to the tools node or finish. ToolNode is a prebuilt utility that automatically invokes tools from tool calls in the message.
State management — the foundation of LangGraph
State in LangGraph is a managed structure with update-merge rules. The annotation Annotated[list, operator.add] tells LangGraph to append new items to the existing list on each node update instead of overwriting it.
from typing import TypedDict, Annotatedfrom langgraph.graph import StateGraphimport operatorclass WorkflowState(TypedDict): messages: Annotated[list, operator.add] current_step: str tool_results: Annotated[list, operator.add] final_answer: str | NoneBest practices for state management:- Keep state flat and simple — deeply nested objects are harder to debug- Use Pydantic instead of TypedDict for validation and default values- Each node returns only the keys it changed — no need to return the whole state- Store in state the context needed for routing decisions, not just output data
Persistence — checkpointing and resumption
This is the feature that separates LangGraph from plain loops. A checkpointer saves a snapshot of state after every step — so you can resume an interrupted session, review step history for debugging, roll back to a previous state, and implement human-in-the-loop with suspension.
from langgraph.checkpoint.sqlite import SqliteSaverfrom langgraph.checkpoint.postgres import PostgresSaver# Developmentmemory_saver = SqliteSaver.from_conn_string(":memory:")app = graph.compile(checkpointer=memory_saver)# Productionwith PostgresSaver.from_conn_string("postgresql://user:pass@host/db") as checkpointer: checkpointer.setup() app = graph.compile(checkpointer=checkpointer)config = {"configurable": {"thread_id": "session-abc-123"}}result = app.invoke({"messages": [HumanMessage("Analyse Q1 report")]}, config)
thread_id is the session key — the same ID continues the same session, a new ID starts a new independent session. In production always pass thread_id — without it every call is an isolated session with no history.
Human-in-the-loop — suspend and approve
/// LANGGRAPH vs CREWAI vs AUTOGEN vs PYDANTIC AI
One of the most powerful patterns: the agent suspends before a sensitive operation and waits for human approval.
from langgraph.types import interrupt, Commanddef review_before_send(state: WorkflowState) -> dict: draft_email = state["draft_email"] approved = interrupt({ "action": "review_email", "draft": draft_email, "message": "Approve or revise the email before sending" }) if approved["decision"] == "approve": return {"send_email": True} return {"send_email": False, "draft_email": approved.get("revised", draft_email)}app = graph.compile(interrupt_before=["review_before_send"])app.invoke( Command(resume={"decision": "approve"}), config)
The human-in-the-loop pattern is indispensable in production systems: the agent can prepare an email, a quote or a purchase decision — and stop before the actual action. A human approves, corrects or rejects. Only then does the system proceed.
Parallel branches — Send API and fan-out
LangGraph lets you run nodes in parallel using the Send API. The map-reduce pattern: one node generates a list of items, each is processed in parallel, results are collected.
from langgraph.types import Senddef generate_tasks(state: WorkflowState) -> list: documents = state["documents"] return [Send("analyze_doc", {"doc": doc}) for doc in documents]def analyze_doc(state: dict) -> dict: doc = state["doc"] analysis = llm.invoke("Analyse: " + doc["content"]) return {"analyses": [{"doc_id": doc["id"], "result": analysis.content}]}graph.add_conditional_edges("prepare_tasks", generate_tasks)
Instead of processing 10 documents sequentially (10× time), LangGraph runs them in parallel — total time equals the slowest single document. Key use cases: batch processing, many API calls at once, parallel analysis of multiple sources.
Streaming — real-time results
LangGraph supports streaming at multiple levels — from tokens to state updates to node-specific events.
async for chunk in app.astream_events( {"messages": [HumanMessage("Analyse the data")]}, config, version="v2"): if chunk["event"] == "on_chat_model_stream": token = chunk["data"]["chunk"].content print(token, end="", flush=True) elif chunk["event"] == "on_chain_end" and chunk["name"] == "agent": print("\n[Agent finished step]")async for state_update in app.astream(inputs, config): node_name = list(state_update.keys())[0] print("Node " + node_name + " finished")
Streaming is a must in UI — the user sees progress instead of waiting tens of seconds for the final result. In practice you stream via WebSocket or Server-Sent Events to the frontend.
Subgraphs and multi-agent systems
LangGraph supports nested graphs — subgraphs as nodes in the main graph. This is the foundation of multi-agent architectures: an orchestrator and specialised agents.
from langgraph.graph import StateGraphfinance_graph = StateGraph(FinanceState)finance_graph.add_node("fetch_data", fetch_financial_data)finance_graph.add_node("analyze", analyze_financials)finance_graph.add_edge(START, "fetch_data")finance_graph.add_edge("fetch_data", "analyze")finance_graph.add_edge("analyze", END)finance_agent = finance_graph.compile()main_graph = StateGraph(OrchestratorState)main_graph.add_node("route_task", route_incoming_task)main_graph.add_node("finance_agent", finance_agent)main_graph.add_node("legal_agent", legal_agent)main_graph.add_node("synthesize", synthesize_results)
Each agent has its own state and internal logic. This approach scales better than one large prompt with all the logic, because each agent is specialised, independently testable and easy to swap out.
LangGraph vs other frameworks
| Criterion | LangGraph | CrewAI | AutoGen | Pydantic AI |
|---|---|---|---|---|
| Paradigm | State graph | Role-based agents | Agent conversations | Typed agents |
| Control flow | Full, explicit | Limited | Limited | Full, explicit |
| Human-in-the-loop | Native (interrupt) | External | Via config | Native |
| Persistence | Native (PostgreSQL) | External | External | Native (PostgreSQL) |
| Streaming | Native (astream_events) | Limited | Limited | Native |
| Debugging | LangGraph Studio | None built-in | None built-in | Logfire |
| Best for | Complex workflows, prod | Fast start, role-based | Research, prototypes | Pydantic stack |
LangGraph — choose when you need full control, multi-step workflows with state management, human-in-the-loop and streaming in production. The most mature production option in 2026.
CrewAI — choose for a fast start with the role-based model. Good for simpler pipelines without complex routing.
AutoGen — choose for research and prototyping; the conversational model is intuitive but harder to control in production.
Pydantic AI — choose if you already use Pydantic and want a typed Python API. Growing fast, good integration with Logfire for monitoring.
Production patterns
Reflection (Self-critique) — The agent generates a response, evaluates its own quality, corrects or finishes. Implementation: a "generate" node + a "critique" node + a conditional edge checking the score.
Plan-and-Execute — A planner creates a list of steps, an executor processes them sequentially or in parallel, a synthesiser combines the results. Good for long tasks requiring many tools.
Orchestrator-Workers — An orchestrator assigns tasks, workers execute in parallel, the orchestrator collects and decides on the next step. The fan-out/fan-in equivalent.
Routing with a classifier — The first node classifies the user's intent, a conditional edge routes to a specialised agent. Instead of one large agent handling everything.
Deployment — LangGraph Platform and LangGraph Studio
LangGraph Studio is a visual debugger: you see the graph, trace state step by step, can edit state and resume from any point. Invaluable when debugging complex workflows.
LangGraph Platform is managed hosting with automatic scaling, persistent storage, a REST API for interacting with graphs, webhooks for async tasks and built-in monitoring. For self-hosted: a LangGraph application is a plain Python application — you deploy it like FastAPI with your own PostgreSQL as the checkpointer.
Common mistakes and production deployment checklist
- 1.Infinite loop without an exit condition — every conditional edge should have a path to END; add an iteration counter to state
- 2.State too large — keep only what you need for decisions; store large data externally, only a reference in state
- 3.No thread_id with checkpointing — without it every call is a new session with no history
- 4.Synchronous code in an async graph — if you use astream, all nodes must be async
- 5.Sensitive data leaking through state — state goes to the checkpointer; do not store tokens or PII in plain text
- 6.No error handling for tools — your routing logic must handle the "all tools failed" scenario
- 7.Define state as a Pydantic model — validation, default values, documentation
- 8.Always add an iteration limit to every loop (an "iteration_count" field in state)
- 9.Connect PostgresSaver with a dedicated table — SQLite in dev only
- 10.Every production call has a thread_id — session or task identifier
- 11.Enable human-in-the-loop on sensitive operations (send, payment, write)
- 12.Add tracing (LangSmith or Langfuse) — log every step and see where the agent gets lost
Key takeaways
LangGraph is the de facto standard for AI agent orchestration in Python in 2026. It turns an agent from a plain while-loop into an explicit, controllable state machine — with checkpointing, full step history, built-in human-in-the-loop, parallel branches and streaming. Key concepts: StateGraph, nodes (Python functions), conditional edges (routing), checkpointer (persistence) and Send API (parallelism). In production use PostgresSaver, always pass thread_id, stream via astream_events and add an iteration limit to every loop. LangGraph beats CrewAI and AutoGen when you need full control, complex routing and production reliability.
---
I help companies design and deploy production AI agent workflows on LangGraph — from graph architecture and checkpointer selection, through human-in-the-loop and parallel branches, to deployment, tracing with LangSmith and cost optimisation. Get in touch — I start with a free 30-minute analysis of your use case.
/// RELATED_RECORDS
How AI Reads Invoices from Email and Enters Them into ERP
AI can automatically read an invoice from an email attachment — PDF, scan, or phone photo — and enter the data directly into an ERP system without any manual retyping. Full automation of cost invoice processing: from the mailbox to accounting.
Where to Start with AI Implementation in Your Company
AI implementation starts not with choosing a tool, but with identifying one repetitive process that wastes the most human time. Learn step by step how to select, map, and automate that process.
How to Build a Company Internal Knowledge Base with AI (RAG in Practice)
An internal knowledge base built on RAG lets you create your own company chatbot that answers only from your company's documents — not the model's guesses. Safe, up-to-date, precise AI with full control over your data.
Signal received?
Terminate
Silence
Initiate protocol. Establish connection. Let's build something loud.
