GPT-5.4 Can Control Your Computer: OpenAI Crosses a New Frontier
OpenAI launches GPT-5.4, an AI agent that autonomously controls a computer. Record OSWorld scores, 3 versions and advanced computer use.

On March 16, 2026, OpenAI launches GPT-5.4, a model capable of autonomously controlling a computer. Opening apps, clicking, filling out forms, browsing the web — all without human intervention. GPT-5.4 sets new records on the OSWorld-Verified and WebArena benchmarks, with 18% fewer errors than GPT-5.2. Three versions are available: standard, Thinking and Pro.
GPT-5.4: Far More Than a Chatbot
GPT-5.4 is not a simple incremental update. It's the first OpenAI model designed from the ground up for computer use — the ability for an AI agent to control a computer just like a human would.
The model comes in three versions. GPT-5.4 standard handles common tasks. GPT-5.4 Thinking adds deep reasoning for complex workflows. GPT-5.4 Pro is the most capable version, built for demanding professional use cases.
All three versions share a 1-million token context window. Compared to GPT-5.2, the main improvement is reliability: 18% fewer errors on desktop interaction tasks. This is no longer a prototype. It's a production-ready tool.
How Does Computer Use Actually Work?
Computer use (autonomous computer control) works on a simple principle. The agent captures screenshots of the screen, identifies clickable elements — buttons, text fields, menus — then executes actions: clicks, keystrokes, scrolling, switching between apps.
In practice, you can give it an instruction like: "Book a Paris-to-New York flight for April 15, email the confirmation to Thomas and add the trip to the calendar." GPT-5.4 executes each step in sequence, without you touching the keyboard.
The model works on macOS, Windows and Linux. It adapts to each operating system's interface and recognizes visual elements even if their position changes.
For developers, the most compelling use case: an agent that opens INTERNAL LINK: Cursor or an AI IDE | vibe coding article, writes code, runs a build, reads compilation errors and fixes them — in a loop, without intervention.
The Benchmarks: GPT-5.4 Surpasses Humans on Desktop Tasks
GPT-5.4 sets new records on both reference benchmarks for computer use.
On OSWorld-Verified, the benchmark evaluating an agent's ability to complete tasks on a real operating system, GPT-5.4 exceeds the human reference score of 72.4%. It's the first time a model has crossed this threshold.
On WebArena, which measures performance on autonomous web tasks (browsing, form filling, multi-site interactions), GPT-5.4 also sets a new record.
| Solution | Publisher | Launch | Accuracy | Availability |
|---|---|---|---|---|
| GPT-5.4 | OpenAI | March 2026 | OSWorld record | API + Pro |
| Claude computer use | Anthropic | Oct 2024 | Good | API |
| Project Mariner | Dec 2024 | Good | Beta | |
| Copilot Actions | Microsoft | Jan 2026 | Decent | M365 |
| Operator | OpenAI | Jan 2025 | Previous gen | API |
The key precedent: Anthropic launched computer use in beta in October 2024 with Claude 3.5 Sonnet. GPT-5.4 goes significantly further in accuracy and reliability on multi-step tasks.
The Computer Use War: OpenAI, Anthropic, Google, Microsoft
Computer use has become the next major AI battle after LLMs. Every major player is positioning for it.
Anthropic was first to launch a public beta with Claude computer use. Google followed with Project Mariner, a web navigation agent integrated into Chrome. Microsoft pushes Copilot Actions into Microsoft 365, targeting office tasks. And OpenAI had already launched Operator in January 2025, a more limited first iteration.
With GPT-5.4, Sam Altman and OpenAI take the lead. The OSWorld record sends a strong signal: GPT-5.4 isn't just the best chatbot. It's the most reliable autonomous agent on the market.
The impact hits roles directly. Virtual assistants, customer support, ops, finance, HR — all repetitive computer tasks are now automatable. INTERNAL LINK: Autonomous AI agents | AI agents in enterprise article are moving from concept to product.
What This Changes for Devs and Builders
For developers, GPT-5.4 opens a long-theorized scenario: the near-autonomous development workflow. An agent that controls a computer can run INTERNAL LINK: Cursor or Replit | vibe coding article on Idlen, launch tests, deploy code, manage SaaS tools — with no human in the loop.
Combined with a INTERNAL LINK: persistent memory system like Nyne | Nyne AI agent article, a GPT-5.4 agent that knows your stack, conventions and preferences becomes a formidable collaborator.
Limitations exist. The error rate isn't zero — 18% better than GPT-5.2, but not flawless. Security raises concerns: content displayed on screen can potentially inject malicious instructions into the agent (visual prompt injection). Apps requiring biometric authentication remain inaccessible. And Pro pricing isn't available on the free tier.
Despite these caveats, the direction is clear. INTERNAL LINK: Frontier models like DeepSeek V4 | DeepSeek V4 article compete on reasoning. GPT-5.4 competes on action.
Key Takeaways
- GPT-5.4 is OpenAI's new model capable of autonomously controlling a computer: clicks, forms, web browsing and multi-step tasks.
- It sets records on OSWorld-Verified (surpassing the 72.4% human score) and WebArena benchmarks, with 18% fewer errors than GPT-5.2.
- Three versions are available: GPT-5.4 standard, Thinking and Pro, with a 1-million token context window.
- The computer use battle pits OpenAI, Anthropic (Claude), Google (Project Mariner) and Microsoft (Copilot Actions) — GPT-5.4 takes the lead.
- Limitations include a residual error rate, visual prompt injection risks and high Pro pricing not available on the free tier.
If an agent can control your computer better than you on repetitive tasks — booking, filling, deploying, testing — what is the irreplaceable role of the human in a development workflow? Probably the same it's always been: deciding what to build, and why.


