Does Computer Use work on desktop apps (Windows), not just browsers?

Yes — the agent can operate any window on the desktop: Win32 apps, Java Swing, legacy ERPs in desktop mode. It requires an environment with a browser or virtual desktop (VM with VNC or RDP) the agent can access. Browsers are easier to stabilise — desktop UIs change less frequently but can be harder for vision models to parse.

How does Computer Use handle MFA / two-factor authentication?

This is the biggest pain point. Standard TOTP (code from an Authenticator app) can be handled by injecting the TOTP secret into the agent — it generates the code without human intervention. SMS OTP requires an SMS gateway integration or a human-in-the-loop module that asks an employee for the code. reCAPTCHA v3 and Cloudflare Turnstile are practically impassable without a dedicated CAPTCHA-solving service (which may violate the service's ToS).

Does data the agent "sees" on screen get sent to a cloud AI model?

Yes, with a cloud model (Claude API, GPT-4o API) the screenshot is transmitted as an image to the API. If the screen contains GDPR data, tax IDs, or account numbers — they go through the cloud provider's processing. The solution for sensitive data: a self-hosted LLM (Llama 3.3 / Mistral) in an isolated internal network — the vision model runs locally, no data leaves the infrastructure.

How fast is execution compared to an API or n8n?

Considerably slower. One step (screenshot → reason → action) takes 3–8 seconds. A task requiring 20 steps takes 1–3 minutes. API-driven automation performs the same flow in seconds. Computer Use is a tool for where there's no alternative — not where speed is critical. For batch processes (e.g. a nightly data update), the time difference has no operational significance.

Can I test Computer Use before commissioning an expensive project?

Yes — Claude.ai (Pro/Team plan) has experimental Computer Use in the browser. OpenAI Operator works similarly. You can try a simple scenario yourself to get a feel for the capabilities and limitations. For production use, however, you need an architecture with API calls, task queuing, error handling, and monitoring — that's the engineering work I build for clients.

Which platform to choose — Claude Computer Use or OpenAI Operator?

In mid-2026 both solutions are at different levels of maturity. Claude Computer Use (Anthropic) is available via API with full programmatic control — preferred for production deployments where you need orchestration and custom logic. OpenAI Operator is more oriented toward a web browsing assistant. For enterprise Computer Use with custom control — Claude API. For a quick web prototype — Operator.

RETURN_TO_BLOG

2026-06-24Updated: 2026-06-24AI & Automation 12 min

Computer Use — AI That Operates Any App Like a Human (No API, No Integration Required)

Computer Use is a technique where an AI agent takes control of a computer: it takes screenshots, analyses what it sees, then clicks, types, and navigates — with no API and no special integration. Your old ERP from the 90s, a government portal with no public API, an application only accessible via Citrix — the agent sees what the employee sees and follows the same steps. In 2026, Claude (Anthropic) and OpenAI Operator are mature implementations of this technology. For businesses, this means one thing: the excuse of "that system has no API" no longer holds.

Your 2003 legacy ERP, a government portal with no API, an old booking system that only runs on Internet Explorer — AI can now operate all of it like a human: clicking, filling forms, reading the screen and making decisions. Computer Use is the biggest shift in automation since RPA. Here's how it works, when it makes more sense than API/n8n, what it costs, and why it replaces an entire class of $200K+ tooling.

Every automation engineer hits the same wall eventually. "Great idea, but our ERP has no API." "We want to automate submissions on the government portal, but there's no webhook." "We have software from 2005 and the vendor went out of business." For years the answers were: an expensive migration, an expensive RPA deployment costing $50K+, or nothing.

Computer Use changes that calculation.

How it works technically — the see, think, act loop

A Computer Use agent operates in a simple, repeatable loop:

/// ACTION LOOP: COMPUTER USE AGENT

Screenshot

Agent sees the UI like a human

›

↓

Vision + Reason

LLM analyses what is on screen

›

↓

Action

Click, type, scroll, keyboard shortcut

›

↓

Verify

New screenshot — was the goal reached?

›

↓

Loop or STOP

Next step or task completion

APIs REQUIRED

45–65%

PROCESSES AUTOMATED

~$200K+

RPA COST IT REPLACES

Step 1 — Screenshot. The agent takes a screenshot of the current screen state. It doesn't see the HTML code or the DOM structure — it sees pixels, exactly like a human.

Step 2 — Vision + Reason. A multimodal model (GPT-4o, Claude 3.5/3.7) analyses the image. It identifies buttons, form fields, error messages, data tables. It understands the context: "I'm on the login page, I need to type the password".

Step 3 — Action. The agent issues a command: click coordinates (X, Y), type text, press Enter, scroll the page, use a keyboard shortcut. The action is executed by a driver (pyautogui, Playwright, xdotool, or the OS's native API).

Step 4 — Verify. A new screenshot. Did I reach the goal? Did an error appear? Do I need another step?

The loop continues until the task is complete or the agent hits a blocker it can't get past (CAPTCHA, two-step verification with SMS code, ambiguous interface).

The key point: the agent understands what it's doing rather than running a rigid script. If the interface changes slightly — a button moved 20 pixels — the agent notices and takes the correct action. This is the fundamental difference from classic RPA.

Computer Use vs API vs n8n vs RPA — when to use what

There is no single tool for everything. Each approach has its context:

Approach	When to use	Cost	UI-change resilience	Requires dev?
API / webhook	System has a public API (REST, GraphQL)	Low	High — UI irrelevant	Yes (config)
n8n / Make / Zapier	Ready-made connectors, flow logic	Low / medium	High	No / a little
RPA (UiPath, Blue Prism)	Stable UI, large enterprise deployments	Very high ($50K–300K+)	Low — brittle	Yes + certification
Computer Use (AI)	No API, legacy, unstable UI, fast start	Medium (LLM costs)	High — adapts	Minimally
Self-hosted LLM + CU	Sensitive data, no cloud	High (GPU)	High	Yes

Rule of thumb: if the system has an API — use the API. If it doesn't, data is sensitive, and volume is high — consider Computer Use with a self-hosted LLM. If data isn't confidential — cloud-based Computer Use (Claude/GPT-4o) is the fastest path.

Where Computer Use truly shines — real-world use cases

Many businesses are surrounded by systems with no API. Here are the scenarios where Computer Use delivers the most value:

1. Government portals (tax authorities, social insurance, official registries) Manual data entry into government systems takes accountants and HR teams hours each week. Many of these portals have no API for small businesses. A Computer Use agent logs in, navigates to the right form, enters data from a prepared JSON file, and confirms the submission. Processing time per form: 2–4 minutes instead of 15–20.

2. Legacy ERP with no API module Older versions of desktop ERPs, or custom in-house systems, operate through a desktop interface. The agent sees the application window, reads fields, fills them with order data and clicks "Confirm". No migration to a new system, no developer work on the ERP side.

3. Customer and supplier portals Checking order statuses on B2B customer platforms (when they don't offer an API), downloading invoices from supplier portals, reporting into retail chain portals for suppliers — the agent performs all these tasks as a logged-in employee.

4. QA automation The agent runs through test scenarios for a web application, clicks, fills forms, and verifies that the result matches expectations. Cheaper than Selenium for unstable UIs because it adapts to changes.

5. Desk research and data collection Browsing dozens of pages looking for specific information (competitor prices, registry data, availability statuses) where HTML scraping is blocked. The agent sees what the browser sees.

Limitations — what Computer Use still can't do well

Honesty requires listing the weak points:

CAPTCHA and strong two-factor authentication. Systems actively defending against bots (reCAPTCHA v3, Cloudflare Turnstile) effectively block agents. There's no good solution without human intervention.
Complex, dynamic UIs. Interfaces with animated canvases, generated SVGs, or custom components are harder for vision models to analyse.
Slow execution. The screenshot–reason–action loop takes 3–8 seconds per step. For processes requiring hundreds of interactions, the time and financial cost grows — an API is always faster.
LLM costs at high volume. Every screenshot is several thousand vision tokens. At 1,000 operations per day, API costs can become significant — worth calculating before deploying.
Security and data confidentiality. The agent sees the screen — if the screen contains sensitive data, it goes to the cloud model. For GDPR data or commercial secrets, a local environment is required (self-hosted LLM + isolated virtual machine).

What it costs — a real calculation

A simplified calculation for a typical scenario (government form submission, 200 submissions/month):

Component	Estimated cost	Notes
Agent build (one-time)	€500–1 500	Depends on UI complexity and number of scenarios
LLM API costs (monthly)	€35–100	Claude/GPT-4o, ~200 operations, avg. 15 steps/op.
Infrastructure (server/VPS)	€12–40/mo	Dedicated desktop VM with browser
Maintenance (quarterly)	€120–400	Updates when UI changes

Comparison: enterprise RPA deployment (UiPath, Blue Prism) for the same process: $50,000–$200,000+ plus annual licences. Computer Use isn't free, but it changes the order of magnitude of the entry cost.

Real case study: supplier portal with no API

Last quarter a client — a manufacturing company — needed to check order statuses on their main buyer's B2B portal every morning and update their own ERP. The buyer's portal offered no API. Manual work took 45–60 minutes daily.

I built a Computer Use agent that logs into the portal at 7:30, goes through the order list, collects statuses and delivery dates, then updates records in the client's own ERP via its API (which it did have). Full flow: 8–12 minutes, fully unattended.

ROI: payback in under 3 months. The employee gained an hour per day for tasks requiring actual decision-making.

Computer Use as the "last mile" layer

The best Computer Use deployments I build use this technology as a last-mile layer — not replacing the entire architecture, but filling a specific gap.

The pattern: n8n orchestrates the flow → API where available → Computer Use where there's no API → result returns to the system via API. This approach combines the speed and reliability of API-driven automation with the flexibility of a vision agent.

Frequently asked questions — Computer Use

/// RELATED_RECORDS

AI & Automation

How AI Reads Invoices from Email and Enters Them into ERP

AI can automatically read an invoice from an email attachment — PDF, scan, or phone photo — and enter the data directly into an ERP system without any manual retyping. Full automation of cost invoice processing: from the mailbox to accounting.

10 min

AI & Automation

Where to Start with AI Implementation in Your Company

AI implementation starts not with choosing a tool, but with identifying one repetitive process that wastes the most human time. Learn step by step how to select, map, and automate that process.

8 min

AI & Automation

How to Build a Company Internal Knowledge Base with AI (RAG in Practice)

An internal knowledge base built on RAG lets you create your own company chatbot that answers only from your company's documents — not the model's guesses. Safe, up-to-date, precise AI with full control over your data.

11 min

/// AUTHOR

Paweł Wiszniewski

SEO & GEO Specialist & AI Engineer

SEO/GEO specialist (10 years) and AI engineer (3 years). I build search visibility, AI systems and automations that reduce costs and improve operational efficiency.

LinkedIn Facebook

Signal received?

Terminate
Silence

Initiate protocol. Establish connection. Let's build something loud.

> WAITING_FOR_INPUT...

BIAŁYSTOK, PL

+48 732 022 086 pawel.wiszniewski95@gmail.com

How it works technically — the see, think, act loop

Computer Use vs API vs n8n vs RPA — when to use what

Where Computer Use truly shines — real-world use cases

Limitations — what Computer Use still can't do well

What it costs — a real calculation

Real case study: supplier portal with no API

Computer Use as the "last mile" layer

Frequently asked questions — Computer Use

Related Articles

/// RELATED_RECORDS

How AI Reads Invoices from Email and Enters Them into ERP

Where to Start with AI Implementation in Your Company

How to Build a Company Internal Knowledge Base with AI (RAG in Practice)

Signal received?

TerminateSilence

Terminate
Silence