Getting started with Computer Use

What you’ll have at the end#

A Docker container running a sandboxed Ubuntu desktop, with Claude clicking through it from your terminal, doing whatever task you ask. About 15 minutes if Docker is already installed.

Before you start#

An Anthropic API key with access to a Computer Use-capable model (Sonnet 4.6 or Opus 4.7 at time of writing — verify at docs.anthropic.com)
Docker Desktop installed (macOS / Windows / Linux)
About 8 GB RAM free for the desktop container
A spending cap on your API key — Computer Use burns tokens fast (see §CU.1 for why)

Step 1 — Clone the quickstart#

git clone https://github.com/anthropics/anthropic-quickstarts.git
cd anthropic-quickstarts/computer-use-demo

Read the README. It is the source of truth for the demo’s current state; this page is a tour, not a substitute.

Step 2 — Set your API key#

export ANTHROPIC_API_KEY=sk-ant-...

If you want to use Bedrock or Vertex instead, the demo accepts an API_PROVIDER env var:

# Bedrock
docker run -e API_PROVIDER=bedrock -e AWS_PROFILE=$AWS_PROFILE ...
# Vertex
docker run -e API_PROVIDER=vertex -e CLOUD_ML_REGION=us-central1 -e ANTHROPIC_VERTEX_PROJECT_ID=... ...

Check the quickstart README for the exact provider-specific environment variables to pass — they’re documented per-provider and change with each release.

Step 3 — Run the container#

docker run \
    -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
    -v $HOME/.anthropic:/home/computeruse/.anthropic \
    -p 5900:5900 \
    -p 8501:8501 \
    -p 6080:6080 \
    -p 8080:8080 \
    -it ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest

(Check the README for the current image name + ports — the above is a snapshot.)

The container boots a virtual desktop (Xvfb + a window manager + Firefox + LibreOffice + a file manager). It exposes:

Port 8080 — the demo web UI (open this in your browser)
Port 6080 — a noVNC view of the desktop (live screen the agent sees)
Port 8501 — Streamlit chat surface
Port 5900 — raw VNC (optional, for connecting with a desktop VNC client)

Open http://localhost:8080 in your browser. You should see a split-pane UI: chat on the left, the live virtual desktop on the right.

Step 4 — Run your first task#

In the chat panel:

Open Firefox and search for “what is MCP” on Google. Show me the first result.

What happens (you watch in real time via the right pane):

Claude takes a screenshot
Decides to click the Firefox icon in the dock
Your harness clicks, takes another screenshot
Claude sees the new Firefox window, clicks the URL bar, types “google.com”
Hits Enter, sees Google, clicks the search box, types the query
Hits Enter, sees results, reads the first one, summarises in chat

Each step takes 5–15 seconds (the API call dominates). The full task is maybe 90 seconds.

Step 5 — Watch the cost#

In a second terminal:

# If you logged usage from the demo, tail it:
docker logs <container-id> | grep "usage"

Or check the Anthropic Console → Usage. Computer Use tasks are EXPENSIVE per session. A 30-action task could be $1–$5 depending on which model + how many screenshots. Across many runs, this adds up.

Adapting for your own task#

The quickstart’s harness is in computer_use_demo/loop.py (Python). The pattern:

System prompt describing what you want
Tool list (computer, bash, text_editor) — Claude can use any combination
Loop: call API → execute returned tool actions → screenshot → call API again

To adapt:

Change the system prompt to match your task
Mount additional volumes if your task needs access to data on your host
Pre-install software in a derived Docker image (custom Dockerfile FROM the demo image)
Add tools if Claude needs more than computer + bash + text_editor (e.g. a custom API call)

The four checks before each run#

What’s the worst case if it goes wrong? If the answer is “loses a day’s work” or “sends real money” — back to sandbox-design. The agent will make mistakes.
What’s logged in / authenticated in the sandbox? Real session cookies = real consequences. Use throwaway accounts where possible.
What’s the spend cap? Set it BEFORE the run, not after.
Is there a kill switch? A way to stop the agent mid-action if it goes off-rails. Ctrl+C works; sometimes you want a faster signal (close the container window).

Common pitfalls#

Symptom	Cause	Fix
Container crashes immediately	Insufficient RAM	Bump Docker Desktop’s memory limit to 8+ GB
`Permission denied` on the mounted volume	macOS / Linux file ownership mismatch	Add `--user $(id -u):$(id -g)` to docker run
Claude clicks the wrong button	UI layout doesn’t match what Claude expected	Tighten the prompt; describe the UI more explicitly
Long task fails halfway	Context window filling up	Break into shorter subtasks; reset the context periodically
Screenshots show black/blank desktop	Xvfb / display server not started	Restart container; check logs for “xvfb: started”
API returns 400 about tool definitions	Wrong API version or wrong tool schema	Check the README for the exact `anthropic-version` header to use

What’s NOT in the demo (and you’d have to build)#

Persistent state. Each container restart is fresh. To persist (browser history, app state), use a Docker volume.
Real apps / accounts. The demo has Firefox + LibreOffice; if your task needs Slack, Photoshop, your bank’s app — you install + configure those.
Production-grade safety rails. Approvals on dangerous actions, audit logging, kill-switch UI. The demo is for learning; production needs more.

What to do next#

§CU.1 Computer Use overview — the wider context if you skipped it
§API.4 Common patterns — the tool-use loop in detail
§CC.5 MCP integration — if your task is “let Claude Code drive my computer,” MCP is often a better fit

`⌘` + `K` · `/`	open search
`j`	next entry (within section)
`k`	previous entry
`g` `h`	go to home
`g` `m`	go to methodology
`?`	show this help
`esc`	close any modal