AIApril 24, 2026 · 09:00 AM7 min readBy Paul Lefizelier

OpenAI Launches GPT-5.5: The Agentic Model Laying the First Brick of the ChatGPT + Codex + Atlas Super App

On April 23, 2026, OpenAI unveiled GPT-5.5, its first fully retrained base model since GPT-4.5. 84.9% on GDPval, 78.7% on OSWorld, 82.7% on Terminal-Bench 2.0 — with API pricing doubled to $5 / $30 per million tokens. Goal: merge ChatGPT, Codex, and Atlas into a single agentic session.

Summarize with AI ChatGPT Claude Perplexity Gemini

OpenAI Launches GPT-5.5: The Agentic Model Laying the First Brick of the ChatGPT + Codex + Atlas Super App

OpenAI is no longer in the business of selling an assistant. On April 23, 2026, the company unveiled GPT-5.5, which Sam Altman called "the most intelligent and most intuitive-to-use model" the house has shipped, and which Greg Brockman described as "the first concrete brick of the super app." Under the hood, it's the first fully retrained base model since GPT-4.5 — not post-training layered on top of GPT-5. And the scores are aggressive: 84.9% on GDPval, 78.7% on OSWorld-Verified, 82.7% on Terminal-Bench 2.0, 98.0% on Tau2-bench Telecom. All of it on doubled API pricing — $5 input, $30 output per million tokens, against $2.50 / $15 for GPT-5.4. The message is clear: the cheap token is no longer the battlefield.

The super app bet: one session, multiple surfaces

For the past three months, OpenAI has been quietly engineering something that doesn't look like an API anymore. The strategy, dubbed unified desktop internally, merges into a single session:

ChatGPT — the primary conversation and user memory
Codex — the development environment and the computer use surface
Atlas — the browser-agent executing against the DOM

GPT-5.5 is the single underlying model stitching the three together. When a user moves from research in ChatGPT to a deploy in Codex to browsing in Atlas, the context isn't reloaded — it stays inside the same mental window of the model. The obvious parallel is Claude Code on Mac announced last fall with Claude Computer Use, but with OpenAI's consumer front end behind it. It's also the direct answer to Codex Desktop, announced three days ago with 90 plugins and memory — GPT-5.5 is the brain Codex Desktop was waiting for.

The benchmarks — and what they reveal about positioning

The numbers OpenAI published tell a specific story: we're done chasing raw SWE-Bench and we're positioning on real work.

Benchmark	GPT-5.5 score	What it measures
GDPval	84.9%	Quality of deliverables across 44 knowledge-work occupations
OSWorld-Verified	78.7%	Autonomous execution in real OS environments
Terminal-Bench 2.0	82.7%	Ability to drive long shell sessions
SWE-Bench Pro	58.6%	Real-world software engineering tasks
Tau2-bench Telecom	98.0%	Business agents without prompt tuning

The benchmark selection is deliberate. GDPval, OSWorld, and Terminal-Bench measure the agent — not snippet completion. On SWE-Bench Pro, 58.6% is respectable but not the podium. For reference, Claude Opus 4.7 ships at 87% on SWE-Bench, and Qwen 3.6 35B-A3B hits around 75% on the same eval. OpenAI is openly conceding the raw-coding title and claiming pole position on "the computer as the agent's playing field." It's a market call: Anthropic for the IDE, OpenAI for the screen.

Doubled pricing — and what it says about the market

The big surprise of the launch isn't technical. It's commercial. GPT-5.5 is twice as expensive on the API as GPT-5.4:

Model	Input ($/M tokens)	Output ($/M tokens)
GPT-5.4	2.50	15.00
GPT-5.5	5.00	30.00
GPT-5.5 Pro	30.00	180.00

In a market where Qwen 3.6 is open-source under Apache 2.0 and where Gemini 2.5 Flash drops below a dollar per million input tokens, doubling the price is a bet. OpenAI justifies it on superior reasoning efficiency — "fewer tokens for a better answer" — and on the value created for agentic use cases. In other words: the per-token price is the wrong signal, what matters is the price per task completed. It's the same editorial line that Anthropic holds by refusing the $800 billion valuation to preserve its pricing trajectory — no one wants to be Uber-2015 in 2026.

Memory, tools, GPT-5.5 Pro

Three underrated elements of the launch worth listing:

Extended session memory. GPT-5.5 holds a multi-hour session in context without aggressive resummarization. That's the prerequisite so a programming agent doesn't ask "which file was it again?" after four tool calls.

Model-driven tool switching. Rather than letting the orchestrator call tools in an outer loop, GPT-5.5 itself decides when to call the atlas, the terminal, the editor, or the file system. The orchestrator becomes a guardrail, not a router.

GPT-5.5 Pro — reserved for long tasks. The Pro tier at $30 / $180 is positioned for scientific research, multi-step reports, audits. It's the only model in the line-up capable of maintaining a chain of reasoning across more than twelve tools without losing coherence, per OpenAI's internal figures.

Availability and deployment

The model has been available since April 23, 2026 on the Plus, Pro, Business, and Enterprise tiers of ChatGPT. The Pro variant is reserved for Pro, Business, and Enterprise tiers. The API opened on launch day with no waitlist.

On the infrastructure side, GPT-5.5 runs on the Nvidia clusters that NeoCognition and others are also filling up — the GPU shortage isn't over, but OpenAI has locked in enough capacity for a global day-one rollout. It's the only way to defend doubled pricing: make availability an argument.

Positioning vs Claude, Gemini, Qwen

Axis	GPT-5.5	Claude Opus 4.7	Gemini 2.5 Pro	Qwen 3.6
SWE-Bench	58.6% (Pro)	87%	~72%	~75%
Computer use	78.7% OSWorld	Strong	Project Mariner	Limited
Input price	$5.00	$3.00	$1.25	Open source
License	Proprietary	Proprietary	Proprietary	Apache 2.0
Strategy	Consumer super app	API + enterprise	Full-stack cloud	Open-weight

OpenAI chooses to be the model of the consumer surface. Claude remains the model of the IDE. Gemini is the model of enterprise cloud, as confirmed by Google Cloud Next 2026 with the Gemini Enterprise Agent Platform and the A2A protocol in production at 150 organizations. And Qwen is the model of teams that refuse to pay for an API. The landscape finally segments by use case, not by "who has the biggest benchmark."

What it means for developers and publishers

For developers building AI apps, three consequences:

Prompt engineering for the agent becomes a skill. GPT-5.5 excels at under-specified tasks but collapses on badly described orchestrations. The gap between an agent that delivers and an agent that loops widens.

Per-token price stops being the relevant KPI. Product teams will track "price per task completed," which includes retries, context, memory. The teams that already figured this out are the ones monetizing their AI apps with @idlen/chat-sdk — where every token the user doesn't use doesn't disappear, it capitalizes.

The agent becomes the subscription unit. ChatGPT Pro at $200/month with unlimited GPT-5.5 Pro isn't a price, it's a subscription to a permanent agent's production. Lovable, Cursor, and Emergent already follow this line. OpenAI just normalized it on the consumer side.

In summary:

OpenAI shipped GPT-5.5 on April 23, 2026 — first fully retrained base model since GPT-4.5.
Key benchmarks: 84.9% on GDPval, 78.7% on OSWorld-Verified, 82.7% on Terminal-Bench 2.0, 98.0% on Tau2-bench Telecom, 58.6% on SWE-Bench Pro.
API pricing doubled: $5 / $30 per million tokens (with the GPT-5.5 Pro variant at $30 / $180).
The super app merges ChatGPT, Codex, and Atlas under a single model and a single session.
Available day one on Plus, Pro, Business, Enterprise, and the API.
Positioning: OpenAI cedes SWE-Bench to Claude and claims pole position on the computer as the agent's playing field.

GPT-5.5 is a bet on form, not on the model. OpenAI has figured out that raw intelligence no longer wins market share — what matters is the integration into the daily life of the developer, the knowledge worker, the researcher. The ChatGPT + Codex + Atlas super app is the embodiment of that bet. Whether enterprises will pay twice the price for a model that scores 58% on SWE-Bench when Claude Opus 4.7 scores 87% is the question. The answer will come with Q2 2026 revenue numbers.

Sources:

#openai #gpt-5-5 #chatgpt #codex #atlas-browser #super-app #agentic-ai #agent #computer-use #gpt-5

← Back to news

Product

Resources

OpenAI Launches GPT-5.5: The Agentic Model Laying the First Brick of the ChatGPT + Codex + Atlas Super App

The super app bet: one session, multiple surfaces

The benchmarks — and what they reveal about positioning

Doubled pricing — and what it says about the market

Memory, tools, GPT-5.5 Pro

Availability and deployment

Positioning vs Claude, Gemini, Qwen

What it means for developers and publishers

More news

Microsoft Bakes Claude Mythos Into Its Security Development Lifecycle: Anthropic's Most Dangerous AI Becomes Windows' Code Antivirus

DeepSeek Raises at $20 Billion: Tencent and Alibaba Turn China's AI Champion Into an Industrial War Weapon

SpaceX Grabs a $60 Billion Option on Cursor — Musk Pulls Tesla, xAI, and Vibe Coding Into the Same Orbit