Webwright

A tiernew this week

Microsoft Research's open-source web agent framework that turns your coding model into a browser-automating power user — by giving it a terminal and Playwright instead of a clicky GUI.

Open Webwright →Compare with alternatives

Kai's verdict

Webwright's code-first philosophy is genuinely clever — treating browser sessions as disposable artifacts and scripts as the real output is a smarter abstraction than most click-replay agents, and the benchmark gains back that up. That said, this is a research repo, not a product, so expect rough edges and bring your own API budget. (Verdict pending Phi's full review.)

Strengths

Code-over-clicks architecture: writes reusable Playwright scripts instead of fragile pixel-level click predictions, dramatically cutting error accumulation on long-horizon tasks
SOTA benchmark numbers: 86.7% on Online-Mind2Web and 60.1% on Odysseys with GPT-5.4 — highest open-sourced harness in the Online-Mind2Web AutoEval category
Tiny, auditable codebase (~1,000 lines across 3 modules) with no hidden orchestration layers — easy to fork, debug, and extend
Reusable CLI tool output: completed task scripts can be parameterized, exported, and shared across Claude Code, Codex, and other agents
Works with smaller models: Qwen3.5-9B hits 66.2% on the hard Online-Mind2Web split when paired with pre-built tool scripts

Weaknesses

Developer-only: purely terminal/CLI — no GUI, no hosted service, requires local setup with Python, Playwright, and your own LLM API keys
Token costs can be substantial: long coding trajectories require context compaction every 20 steps, and complex tasks can run 50–100 steps deep
Early-stage research release: minimal documentation, no managed hosting, and real-world reliability outside benchmark conditions is unproven

Best for

Developers and researchers who want a minimal, hackable web agent harness they can actually read and modify — especially if they already use Claude Code or Codex and want to bolt on serious browser automation.

Pricing

Free (open-source)

MIT-licensed; you pay only for the LLM API tokens you consume (OpenAI, Anthropic, or OpenRouter backends supported).

Alternatives worth knowing

ChatGPT Operator

OpenAI's browser agent. Clicks and types on websites for you.

Manus

Autonomous AI agent that actually finishes tasks.

Claude Agent SDK

Anthropic's SDK for building your own agents on Claude.

Cline

Open-source VS Code agent. Reads + writes + runs.

Devin

Cognition Labs' autonomous coding engineer.