AI Agents — What They Are, How They Work, and When to Deploy Them
ChatGPT answers questions. An AI agent asks them itself, searches the web, makes decisions, and executes tasks — without your involvement. I explain the architecture, agent types, and when it actually makes sense to invest.
Mark runs a marketing agency — eight people, a dozen clients, a permanent shortage of time. He spends every morning on email: qualifying leads, responding to pricing enquiries, routing requests to the right specialists. In the afternoons he writes proposal briefs, researches potential clients before meetings, and updates the CRM. By 6pm he has not touched any of the actual project work.
When I asked him whether he had heard of AI agents, he replied: "Same thing as ChatGPT, right?" That is a fair question. And the answer is exactly where the important distinction begins.
What an AI Agent Is — and What It Is Not
ChatGPT is a language model. It waits for you to type a question, generates a response, and waits for the next one. It does nothing when you are not asking. It does not remember previous conversations without plugins. It does not take actions on your behalf.
An AI agent is something different. It receives a goal — for example "prepare me a client brief before tomorrow's meeting" — and independently plans what steps to take to achieve it. It searches for information about the company, checks the CRM history, pulls the latest proposal from the drive, combines the data, and delivers a finished document. Without your involvement at every step.
The difference between a chatbot, automation, and an AI agent is fundamental. To see it clearly:
| Feature | Chatbot | Automation (Zapier/Make) | AI Agent |
|---|---|---|---|
| Responds to questions | ✓ | ✗ | ✓ |
| Initiates actions on its own | ✗ | Only on trigger | ✓ |
| Makes decisions | ✗ | ✗ | ✓ |
| Handles multi-step tasks | ✗ | Limited | ✓ |
| Uses tools dynamically | ✗ | Static | ✓ |
| Remembers context long-term | ✗ | ✗ | ✓ |
| Handles exceptions | ✗ | ✗ | ✓ |
A chatbot responds. Automation executes a fixed flow. An agent plans and decides.
That last point is critical. Zapier can send an email when a form is submitted. An agent can read that email, understand what the client wants, check whether they match the ICP criteria, decide whether to respond immediately or escalate to a human, generate a personalised reply, and log everything in the CRM — all without a single if-else written by a developer.
How an AI Agent Works — Architecture
An AI agent is not a single model. It is a system built from four elements working together.
LLM (Large Language Model) is the brain of the agent. GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro — the model that understands the instruction, plans the steps, and interprets the results. On its own it is useless without the other components.
Tools are the hands of the agent. A set of functions the LLM can call: web search, database, email API, Python script, spreadsheet. The agent does not guess — it uses a tool, gets the result, analyses it, and decides what to do next.
Memory operates at two levels. Short-term is the context of the current session — the full history of actions and observations the agent has performed so far. Long-term is an external vector database (e.g. Pinecone, Chroma) — this is where the agent stores and retrieves information between sessions. Without long-term memory an agent "forgets" between tasks.
Planner / Orchestrator is the decision logic. It decides when to call a tool, when to ask the user for clarification, when to consider the task complete. The most commonly used pattern is ReAct (Reason + Act) — I think, I act, I observe, I think again.
The agent loop works as follows: the agent receives a goal, selects a tool, executes an action, observes the result, and decides whether to continue or whether the task is done. This cycle can run a dozen times before the agent delivers the final output.
Here is what a simplified implementation of this loop looks like:
# Simplified ReAct loop — Reason + Actwhile not done: thought = llm.reason(current_context, available_tools) action = llm.choose_tool(available_tools, thought) observation = tools.execute(action.name, action.params) current_context.append({"thought": thought, "action": action, "result": observation}) done = llm.check_goal_reached(current_context, original_goal)return llm.summarize(current_context)
The loop continues until the agent assesses the goal is achieved or encounters a situation requiring human intervention. In practice you add iteration limits and a human-in-the-loop checkpoint as a safety net against infinite loops.
/// ARCHITEKTURA AGENTA AI — PETLA REACT
Types of Agents
There is no single "AI agent". There are several architectures, each with different trade-offs between speed, quality, and cost.
| Agent type | How it plans | Example use case | Deployment cost | When to choose |
|---|---|---|---|---|
| Reactive (ReAct) | Step by step, no prior plan | Responding to emails | Low | Simple, repetitive tasks |
| Planning (Plan & Execute) | Creates plan before acting | Research, reports | Medium | Complex tasks with a clear output |
| Reflective (Self-reflection) | Evaluates its own results and improves | Code generation, legal analysis | High | Tasks requiring high accuracy |
| Multi-agent (CrewAI) | Several agents collaborate | Sales pipeline, content | High | Large multi-step processes |
ReAct is the most commonly used — simple, predictable, easy to debug. The right starting point.
Plan & Execute creates a full task plan before executing. Better for complex research tasks where the end result is known — for example "write a market research report on five competitors". The agent plans all steps first, then executes them.
Self-reflection is an agent that evaluates its own work after completing a task and improves it before delivery. More expensive in tokens, but significantly better on tasks requiring precision — code generation, legal analysis, proposal writing.
Multi-agent is an architecture where several specialised agents collaborate like a team. One collects data, another analyses it, a third writes the report. Works well with complex processes that can be divided into independent specialisations. The CrewAI and AutoGen frameworks implement this pattern.
Frameworks and Tools
Building an agent from scratch makes no sense. There are several mature frameworks that differ in philosophy and target audience.
| Tool | Difficulty level | What it offers | For whom |
|---|---|---|---|
| LangChain + LangGraph | Advanced | Full control, agent types | Python developers |
| CrewAI | Intermediate | Multi-agent, role-based | Developers with AI experience |
| n8n (AI nodes) | Low-Medium | No-code/low-code, visual | Companies without a developer |
| AutoGen (Microsoft) | Advanced | Dialog between agents | Enterprise, R&D |
| Claude Tool Use API | Intermediate | Native Anthropic tools | API-first projects |
| Flowise | Low | Drag & drop LangChain | Prototypes, small businesses |
From my own experience: LangGraph gives the greatest control, but requires Python knowledge and time to learn. State graphs, transition conditions, built-in checkpoints — this is a tool for someone who understands software architecture, not just prompting.
n8n with AI nodes is the fastest route to a working agent in a company without a tech team. I have deployed several such solutions — visual flow editor, built-in connectors for CRM, email, spreadsheets — everything ready. Limitations appear with more complex decision logic.
CrewAI is my favourite framework for building production agents — readable syntax, good documentation, proven multi-agent patterns. I choose the tool for the specific problem, not the other way around.
Flowise is a good choice for a prototype — drag components, connect with arrows, get a working agent in an hour. Not suitable for production with serious reliability requirements, but excellent for a Proof of Concept.
Concrete Use Cases — Where an Agent Delivers Real Value
Theory covered. Let us move to numbers, because that is the only measure that matters in a conversation with a business owner.
| Area | What the agent does | Manual time | With agent | Weekly saving |
|---|---|---|---|---|
| Sales email handling | Classifies, generates response drafts | 3 min/email x 50 emails | 30 sec review | ~20 hrs |
| B2B client research | Collects company data, news, contacts | 45 min/company | 8 minutes | ~12 hrs |
| Proposal generation | Brief from CRM to proposal draft | 30-45 min/proposal | 3-5 min review | ~15 hrs |
| Mention monitoring | Web scraping, sentiment, report | 2 hrs/day | Automated report at 7am | ~10 hrs |
| Campaign reporting | GA4 + Meta + Google Ads to PDF | 3 hrs/week | Automated report | 3 hrs |
The best example I deployed recently: B2B lead qualification agent.
A company was receiving leads through a website form and LinkedIn. Before the deployment, a sales rep spent 45 minutes on each lead to decide whether it was worth calling. The agent does it in eight minutes and delivers a ready client card with an ICP fit score.
How it works step by step:
- 1.Webhook from the form or LinkedIn activates the agent
- 2.Agent collects company data: website, LinkedIn, company databases, latest news
- 3.Compares data against the Ideal Customer Profile (ICP) loaded from the vector database
- 4.Generates a 0-100 score with justification for each component
- 5.Updates the client card in the CRM (HubSpot or Pipedrive)
- 6.Sends a Slack notification with a recommendation: "Call today", "Send nurturing", "Disqualify"
The sales rep gets a notification with a summary: company, industry, revenue, score, three reasons why or why not. The decision on next steps takes 30 seconds instead of 45 minutes. With 20 leads per week that is 14 hours returned to high-value sales work.
Another example: brand mention monitoring agent. Every morning at 7am the agent scans Reddit, Twitter/X, Google News, and industry forums — identifies mentions of the client and competitors, assesses sentiment, flags reputation crises, and generates a PDF report with a summary and recommendations. Previously someone did this manually for two hours every day.
When an Agent Is Overkill
AI agents are not the answer to every problem. There are several situations where a simpler solution is better, cheaper, and more reliable.
| Situation | Recommendation | Why |
|---|---|---|
| Always same input to output | Zapier / Make | Cheaper and more reliable |
| Simple notifications | n8n basic flow | Unnecessary agent complexity |
| Multi-step, variable data | ReAct agent | Dynamic decisions |
| Multiple specialisations in parallel | Multi-agent | Division of responsibility |
| Tasks requiring > 90% accuracy | Agent + human-in-the-loop | Control of critical decisions |
When I do not deploy an agent: - Contact form → save to CRM. That is a webhook, not an agent. Zapier for 20 dollars per month. - Daily reports from the same sources with the same fixed structure. Cron job and Python script. - Notifications after an event. n8n basic flow at zero cost per month. - Processes with zero exceptions and a constant input-output schema.
When an agent makes sense: - The task requires reasoning, not just passing data - Input data is variable and unpredictable — each case is slightly different - The output must be context-adapted, not template-based - There are exceptions that need to be handled differently from standard cases
The simple test I use when qualifying a project: would I give this task to an intern with internet access and a CRM, and would they manage after a 30-minute briefing? If yes — an agent can do it. If it requires an expert with years of experience and deep business intuition — the agent will not cope or will be unreliable and expensive to fix.
How Much Does It Cost
The question that always comes up at this point in the conversation. And rightly so.
| Deployment type | Build cost | Monthly API | Example | Return on investment |
|---|---|---|---|---|
| Simple no-code agent (n8n/Make) | 1,500–4,000 PLN | 100–400 PLN | Email sorting, research | 2–6 weeks |
| Custom agent (Python/LangChain) | 5,000–15,000 PLN | 300–1,500 PLN | Proposal pipeline, sales agent | 4–10 weeks |
| Multi-agent system | 15,000–60,000 PLN | 1,000–5,000 PLN | Full automated department | 3–6 months |
To see this in numbers: if a sales email agent saves 20 hours per week and the hourly rate is 30 GBP — that is 600 GBP per week, 2,400 GBP per month. A no-code agent deployed for 500 GBP pays back within 15 days. Monthly API: around 50 GBP.
This is not a long-horizon investment. It is price arbitrage — pay once, save every month.
API costs depend on the model and token count. GPT-4o mini costs a fraction of GPT-4o while retaining 80% of the capability for most tasks. In the agents I deploy, I always match the model to the task. Classifying an email as "urgent/not urgent" does not need GPT-4o. Generating a 15,000 PLN proposal — it does.
An important note on API costs: an agent executing 10 steps to handle one lead can consume 3,000–8,000 tokens. With GPT-4o that is 0.10–0.25 USD. With 50 leads per week — 5–13 USD per week. At production scale I always run a token cost estimate before deployment.
How to Start — 6 Steps
I do not start with technology. I start with the process.
- 1.Identify the most expensive repetitive task — not "many tasks", one specific one. The one that costs you or your team the most time every week. Write it down.
- 1.Describe it step by step — from start to finish, as if briefing a new employee. What is the input? What decisions are made? What is the output? Where are the exceptions? This description becomes the agent blueprint.
- 1.Check data availability — the agent needs access to the same sources as the human doing the same task. Does the CRM have an API? Are emails accessible via IMAP or Gmail API? Do you have historical examples of good and bad outputs?
- 1.Choose the framework — for a company without a developer: n8n or Make with AI nodes. For a company with Python access: LangGraph or CrewAI. Do not over-engineer at the start. A simpler tool that works beats an advanced one sitting broken on a server.
- 1.Build an MVA — Minimum Viable Agent — one goal, three tools maximum, human oversight after each step. Test the agent on 20 sample cases. Then and only then extend with more tools and less human oversight.
- 1.Measure the result — time before vs. after deployment, output quality (does the sales rep use the draft or rewrite it from scratch?), cost of errors. Without measurement you do not know whether the agent is better than a human.
The most common mistake I see: companies want to immediately build a multi-agent system to automate an entire marketing or sales department. This almost always fails — too many variables, too few tests, too high expectations. One agent, one task, one month of production operation — then the next.
What Deployment Looks Like in Practice — Case Study
Here is a concrete project: a lead qualification agent for an SEM agency, a 3-person sales team, around 30 leads per week.
Before deployment: - Each lead required 40-50 minutes of work: checking the client's website, LinkedIn, advertising spend history, budget estimate - Sales reps spent half their time on leads that never passed qualification anyway - No consistent scoring system — each rep had their own intuitive criteria
Agent architecture: 1. Typeform webhook activates n8n 2. n8n calls a LangChain agent with GPT-4o mini 3. The agent has access to 4 tools: Playwright (website scraping), LinkedIn API (company data), SerpAPI (organic visibility), RAG memory with client ICP 4. The agent generates a client card: industry, size, estimated budget, ICP fit 0-100, justification and recommendation 5. The card goes to HubSpot as a deal with the right fields populated 6. Slack alert to the appropriate sales rep with priority level
After 30 days: - Qualification time: 8 minutes instead of 45 minutes - Leads rejected below score 40: 35% of all leads — reps stopped wasting time on them - Time to first contact with premium lead (score > 70): from 6 hours to 45 minutes - ROI: agent cost 800 GBP to build + 35 GBP/month in API. With 3 reps each saving 6 hours per week at 35 GBP/hour — payback in 18 days.
Key observation: during the first two weeks, reps reviewed every client card and gave feedback ("this should be 80, not 60, because of X"). This feedback went into the prompt as additional examples. After four weeks, scoring accuracy reached a level that satisfied the whole team.
Common Mistakes When Deploying Agents
Over the last two years I have deployed a dozen agents for various companies. The same mistakes come up again and again.
Mistake 1: Tools that are too generic. An agent with access to "the internet" and "all CRM data" is weaker than an agent with access to three specific, well-defined tools. Precision beats breadth every time.
Mistake 2: No positive and negative examples. An LLM needs patterns. "A good lead is a company with 50+ employees and a budget above 3,000 GBP per month" is a better instruction than "assess fit against our ICP". Concrete examples are worth more than general definitions.
Mistake 3: No production monitoring. An agent operating autonomously without logs is a time bomb. Every tool call, every decision, and every result should be logged. Not to read constantly, but to have the ability to debug when something goes wrong.
Mistake 4: An overambitious MVP. The first agent should do one thing well — not three things adequately. Expansion comes with time and production experience.
FAQ
---
If you have a specific process in mind and want to check whether an agent is the right choice — get in touch. We will start with a 30-minute conversation about what is eating your time, and assess together whether an agent is the right tool or whether a simpler flow will do.
/// RELATED_RECORDS
How AI Reads Invoices from Email and Enters Them into ERP
AI can automatically read an invoice from an email attachment — PDF, scan, or phone photo — and enter the data directly into an ERP system without any manual retyping. Full automation of cost invoice processing: from the mailbox to accounting.
Where to Start with AI Implementation in Your Company
AI implementation starts not with choosing a tool, but with identifying one repetitive process that wastes the most human time. Learn step by step how to select, map, and automate that process.
How to Build a Company Internal Knowledge Base with AI (RAG in Practice)
An internal knowledge base built on RAG lets you create your own company chatbot that answers only from your company's documents — not the model's guesses. Safe, up-to-date, precise AI with full control over your data.
Signal received?
Terminate
Silence
Initiate protocol. Establish connection. Let's build something loud.
