Computer Use — AI That Operates Any App Like a Human (No API, No Integration Required)
Computer Use is a technique where an AI agent takes control of a computer: it takes screenshots, analyses what it sees, then clicks, types, and navigates — with no API and no special integration. Your old ERP from the 90s, a government portal with no public API, an application only accessible via Citrix — the agent sees what the employee sees and follows the same steps. In 2026, Claude (Anthropic) and OpenAI Operator are mature implementations of this technology. For businesses, this means one thing: the excuse of "that system has no API" no longer holds.
Your 2003 legacy ERP, a government portal with no API, an old booking system that only runs on Internet Explorer — AI can now operate all of it like a human: clicking, filling forms, reading the screen and making decisions. Computer Use is the biggest shift in automation since RPA. Here's how it works, when it makes more sense than API/n8n, what it costs, and why it replaces an entire class of $200K+ tooling.
Every automation engineer hits the same wall eventually. "Great idea, but our ERP has no API." "We want to automate submissions on the government portal, but there's no webhook." "We have software from 2005 and the vendor went out of business." For years the answers were: an expensive migration, an expensive RPA deployment costing $50K+, or nothing.
Computer Use changes that calculation.
How it works technically — the see, think, act loop
A Computer Use agent operates in a simple, repeatable loop:
/// ACTION LOOP: COMPUTER USE AGENT
Step 1 — Screenshot. The agent takes a screenshot of the current screen state. It doesn't see the HTML code or the DOM structure — it sees pixels, exactly like a human.
Step 2 — Vision + Reason. A multimodal model (GPT-4o, Claude 3.5/3.7) analyses the image. It identifies buttons, form fields, error messages, data tables. It understands the context: "I'm on the login page, I need to type the password".
Step 3 — Action. The agent issues a command: click coordinates (X, Y), type text, press Enter, scroll the page, use a keyboard shortcut. The action is executed by a driver (pyautogui, Playwright, xdotool, or the OS's native API).
Step 4 — Verify. A new screenshot. Did I reach the goal? Did an error appear? Do I need another step?
The loop continues until the task is complete or the agent hits a blocker it can't get past (CAPTCHA, two-step verification with SMS code, ambiguous interface).
The key point: the agent understands what it's doing rather than running a rigid script. If the interface changes slightly — a button moved 20 pixels — the agent notices and takes the correct action. This is the fundamental difference from classic RPA.
Computer Use vs API vs n8n vs RPA — when to use what
There is no single tool for everything. Each approach has its context:
| Approach | When to use | Cost | UI-change resilience | Requires dev? |
|---|---|---|---|---|
| API / webhook | System has a public API (REST, GraphQL) | Low | High — UI irrelevant | Yes (config) |
| n8n / Make / Zapier | Ready-made connectors, flow logic | Low / medium | High | No / a little |
| RPA (UiPath, Blue Prism) | Stable UI, large enterprise deployments | Very high ($50K–300K+) | Low — brittle | Yes + certification |
| Computer Use (AI) | No API, legacy, unstable UI, fast start | Medium (LLM costs) | High — adapts | Minimally |
| Self-hosted LLM + CU | Sensitive data, no cloud | High (GPU) | High | Yes |
Rule of thumb: if the system has an API — use the API. If it doesn't, data is sensitive, and volume is high — consider Computer Use with a self-hosted LLM. If data isn't confidential — cloud-based Computer Use (Claude/GPT-4o) is the fastest path.
Where Computer Use truly shines — real-world use cases
Many businesses are surrounded by systems with no API. Here are the scenarios where Computer Use delivers the most value:
1. Government portals (tax authorities, social insurance, official registries) Manual data entry into government systems takes accountants and HR teams hours each week. Many of these portals have no API for small businesses. A Computer Use agent logs in, navigates to the right form, enters data from a prepared JSON file, and confirms the submission. Processing time per form: 2–4 minutes instead of 15–20.
2. Legacy ERP with no API module Older versions of desktop ERPs, or custom in-house systems, operate through a desktop interface. The agent sees the application window, reads fields, fills them with order data and clicks "Confirm". No migration to a new system, no developer work on the ERP side.
3. Customer and supplier portals Checking order statuses on B2B customer platforms (when they don't offer an API), downloading invoices from supplier portals, reporting into retail chain portals for suppliers — the agent performs all these tasks as a logged-in employee.
4. QA automation The agent runs through test scenarios for a web application, clicks, fills forms, and verifies that the result matches expectations. Cheaper than Selenium for unstable UIs because it adapts to changes.
5. Desk research and data collection Browsing dozens of pages looking for specific information (competitor prices, registry data, availability statuses) where HTML scraping is blocked. The agent sees what the browser sees.
Limitations — what Computer Use still can't do well
Honesty requires listing the weak points:
- CAPTCHA and strong two-factor authentication. Systems actively defending against bots (reCAPTCHA v3, Cloudflare Turnstile) effectively block agents. There's no good solution without human intervention.
- Complex, dynamic UIs. Interfaces with animated canvases, generated SVGs, or custom components are harder for vision models to analyse.
- Slow execution. The screenshot–reason–action loop takes 3–8 seconds per step. For processes requiring hundreds of interactions, the time and financial cost grows — an API is always faster.
- LLM costs at high volume. Every screenshot is several thousand vision tokens. At 1,000 operations per day, API costs can become significant — worth calculating before deploying.
- Security and data confidentiality. The agent sees the screen — if the screen contains sensitive data, it goes to the cloud model. For GDPR data or commercial secrets, a local environment is required (self-hosted LLM + isolated virtual machine).
What it costs — a real calculation
A simplified calculation for a typical scenario (government form submission, 200 submissions/month):
| Component | Estimated cost | Notes |
|---|---|---|
| Agent build (one-time) | €500–1 500 | Depends on UI complexity and number of scenarios |
| LLM API costs (monthly) | €35–100 | Claude/GPT-4o, ~200 operations, avg. 15 steps/op. |
| Infrastructure (server/VPS) | €12–40/mo | Dedicated desktop VM with browser |
| Maintenance (quarterly) | €120–400 | Updates when UI changes |
Comparison: enterprise RPA deployment (UiPath, Blue Prism) for the same process: $50,000–$200,000+ plus annual licences. Computer Use isn't free, but it changes the order of magnitude of the entry cost.
Real case study: supplier portal with no API
Last quarter a client — a manufacturing company — needed to check order statuses on their main buyer's B2B portal every morning and update their own ERP. The buyer's portal offered no API. Manual work took 45–60 minutes daily.
I built a Computer Use agent that logs into the portal at 7:30, goes through the order list, collects statuses and delivery dates, then updates records in the client's own ERP via its API (which it did have). Full flow: 8–12 minutes, fully unattended.
ROI: payback in under 3 months. The employee gained an hour per day for tasks requiring actual decision-making.
Computer Use as the "last mile" layer
The best Computer Use deployments I build use this technology as a last-mile layer — not replacing the entire architecture, but filling a specific gap.
The pattern: n8n orchestrates the flow → API where available → Computer Use where there's no API → result returns to the system via API. This approach combines the speed and reliability of API-driven automation with the flexibility of a vision agent.
Frequently asked questions — Computer Use
Related Articles
/// RELATED_RECORDS
How AI Reads Invoices from Email and Enters Them into ERP
AI can automatically read an invoice from an email attachment — PDF, scan, or phone photo — and enter the data directly into an ERP system without any manual retyping. Full automation of cost invoice processing: from the mailbox to accounting.
Where to Start with AI Implementation in Your Company
AI implementation starts not with choosing a tool, but with identifying one repetitive process that wastes the most human time. Learn step by step how to select, map, and automate that process.
How to Build a Company Internal Knowledge Base with AI (RAG in Practice)
An internal knowledge base built on RAG lets you create your own company chatbot that answers only from your company's documents — not the model's guesses. Safe, up-to-date, precise AI with full control over your data.
Signal received?
Terminate
Silence
Initiate protocol. Establish connection. Let's build something loud.
