KaiAI tutor for anyone
← All tools

Webwright

A tiernew this week

Microsoft Research's open-source web agent framework that turns your coding model into a browser-automating power user — by giving it a terminal and Playwright instead of a clicky GUI.

Open Webwright →Compare with alternatives

Kai's verdict

Webwright's code-first philosophy is genuinely clever — treating browser sessions as disposable artifacts and scripts as the real output is a smarter abstraction than most click-replay agents, and the benchmark gains back that up. That said, this is a research repo, not a product, so expect rough edges and bring your own API budget. (Verdict pending Phi's full review.)

Strengths

  • Code-over-clicks architecture: writes reusable Playwright scripts instead of fragile pixel-level click predictions, dramatically cutting error accumulation on long-horizon tasks
  • SOTA benchmark numbers: 86.7% on Online-Mind2Web and 60.1% on Odysseys with GPT-5.4 — highest open-sourced harness in the Online-Mind2Web AutoEval category
  • Tiny, auditable codebase (~1,000 lines across 3 modules) with no hidden orchestration layers — easy to fork, debug, and extend
  • Reusable CLI tool output: completed task scripts can be parameterized, exported, and shared across Claude Code, Codex, and other agents
  • Works with smaller models: Qwen3.5-9B hits 66.2% on the hard Online-Mind2Web split when paired with pre-built tool scripts

Weaknesses

  • Developer-only: purely terminal/CLI — no GUI, no hosted service, requires local setup with Python, Playwright, and your own LLM API keys
  • Token costs can be substantial: long coding trajectories require context compaction every 20 steps, and complex tasks can run 50–100 steps deep
  • Early-stage research release: minimal documentation, no managed hosting, and real-world reliability outside benchmark conditions is unproven

Best for

Developers and researchers who want a minimal, hackable web agent harness they can actually read and modify — especially if they already use Claude Code or Codex and want to bolt on serious browser automation.

Pricing

Free (open-source)

MIT-licensed; you pay only for the LLM API tokens you consume (OpenAI, Anthropic, or OpenRouter backends supported).

Alternatives worth knowing