AIApril 18, 2026 · 09:00 AM6 min readBy Paul Lefizelier

Anthropic Ships Claude Opus 4.7 — 87.6% on SWE-bench and an Awkward Admission About Mythos

On April 16, 2026, Anthropic makes Claude Opus 4.7 generally available. +13% on its internal coding benchmark, 87.6% on SWE-bench Verified, 3x sharper vision. And a first: Anthropic admits Mythos, unreleased, is better.

Summarize with AI ChatGPT Claude Perplexity Gemini

Anthropic Ships Claude Opus 4.7 — 87.6% on SWE-bench and an Awkward Admission About Mythos

On April 16, 2026, Anthropic made Claude Opus 4.7 generally available. The model picks up 6.8 points on SWE-bench Verified (80.8% → 87.6%), 12 points on CursorBench (58% → 70%), and solves four tasks that no previous model — Opus 4.6 or Sonnet 4.6 — could handle. But the real story isn't there. In the launch blog post, Anthropic did something it had never done before: it publicly admitted that another of its models, Claude Mythos, is more capable but will not ship. The official champion is a runner-up.

The Internal Benchmark: +13%, Four Previously Impossible Tasks Unlocked

Since 2024, Anthropic has maintained a proprietary bench of 93 coding tasks reflecting the real work of senior engineers. It's the benchmark the team uses internally to decide if a model is ready. Opus 4.7 gains +13 points over Opus 4.6. Four tasks that Opus 4.6 and Sonnet 4.6 failed to solve now pass.

Benchmark	Opus 4.6	Opus 4.7	Delta
SWE-bench Verified	80.8%	87.6%	+6.8 pts
CursorBench	58%	70%	+12 pts
Internal 93-task bench	baseline	+13%	+13 pts
Vision (max resolution)	768px	2,576px	3.4x

The CursorBench gain is the most telling. That benchmark measures edit quality inside a real IDE, not GitHub ticket resolution. +12 points on a minor release suggests Anthropic specifically tuned Opus 4.7 for IDE coding agents, which are now the primary vector of API consumption.

Self-Verification: The Real Novelty

Opus 4.7 introduces a capability no Anthropic model had this explicitly: it verifies its own output before answering. The model rereads its code, mentally runs the tests it wrote, and fixes bugs before they reach the user.

On long agentic tasks — where a model has to chain 20 or 30 tool calls without human supervision — the difference is massive. Engineers who tested the preview report being able to hand off tickets they previously didn't dare delegate: cross-repo refactors, version migrations, debugging intermittent errors. The model no longer just produces plausible code, it verifies that it passes.

This is the capability that was missing from Claude Code 2 earlier this year. With Opus 4.7, the autonomous agent promise becomes credible across a full development cycle.

Vision: 3.4x More Pixels

Second hardware upgrade: Opus 4.7 accepts images up to 2,576 pixels on the long edge, versus 768 for previous Claude models. That's 3.4x more resolution.

It changes three concrete use cases.

Reading dashboard screenshots. A 1920x1080 Grafana chart was blurry until now. It's now read pixel-perfect.

Parsing PDFs with dense tables. Financial reports, contracts, engineering specs become legible without downsampling.

Vibe design. A designer can upload a high-res Figma and ask Claude to produce the matching frontend code — something flaky at 768px becomes usable at 2,576px.

The Mythos Admission: A First

The most unusual part of the launch is one line in the blog post: Anthropic concedes that Mythos, its internal model codenamed Capybara, outperforms Opus 4.7. But Mythos isn't shipping publicly. It's accessible to 50 organizations through Project Glasswing, a cybersecurity initiative, with $100 million in usage credits distributed.

Model	SWE-bench	USAMO 2026	Status
Opus 4.6	80.8%	42.3%	Deprecated
Opus 4.7	87.6%	not released	GA (April 16)
Mythos Preview	93.9%	97.6%	Limited preview

The gap is staggering: +6.3 points on SWE-bench and especially +55 points on USAMO 2026 for Mythos over Opus 4.7. On competition math reasoning, Mythos is in its own league.

Why would Anthropic ship a model while admitting it's hiding a better one? Two readings.

Safety reading. Anthropic identified in Mythos strategic manipulation and exfiltration capabilities it can't yet mitigate. Shipping it would put an offensive-capable model into circulation. Anthropic prefers to keep Mythos in closed preview and monetize the gap through Glasswing.

Business reading. Publicly acknowledging a more powerful model justifies a premium tier strategy. The 50 Glasswing organizations pay for early access. In 6 months, Anthropic ships Mythos publicly and captures a second enterprise upgrade cycle. It's Apple's playbook with M1 Pro vs M1 Max chips.

Unchanged Pricing, Pressure on OpenAI

Opus 4.7 stays at $5/$25 per million input/output tokens, the exact same price as Opus 4.6. Anthropic isn't capturing value through pricing, it's capturing it through lock-in. Teams that already moved their pipeline to Claude get a 13% better model for free.

For OpenAI, the pressure becomes specific. GPT-5.4 sits at 52% on SWE-bench Verified per recent leaks. Opus 4.7 at 87.6% widens the gap to 35 points. On the coding assistant market — 42% of enterprise API spend — Anthropic's dominance can no longer be called cyclical. It's structural.

Segment	Leader	Gap vs #2
SWE-bench Verified	Opus 4.7 (87.6%)	+35 pts vs GPT-5.4
CursorBench	Opus 4.7 (70%)	+18 pts vs GPT-5.4
Dense vision OCR	Opus 4.7	3.4x pixels vs GPT-5.4
Math reasoning	Mythos Preview	off-market

What Developers Get Concretely

In Claude Code / Cursor / Windsurf. The switch is automatic for Pro and Team users. No configuration. Pull requests that needed 3-4 iterations on Opus 4.6 close in 1-2 on 4.7.

In the API. Two months of free credit for customers who were on Opus 4.6 — Anthropic is explicitly pushing the migration.

In Bedrock, Vertex AI, Foundry. Immediate availability. Amazon, Google and Microsoft integrated Opus 4.7 on launch day — a first for an Anthropic model.

In GitHub Copilot. The "Claude Opus 4.7" option rolled out April 16 for Enterprise users. GitHub updated its changelog the same day.

What It Changes for the Market

For vibe coding startups. Cursor, Lovable, Replit, Emergent see their perceived quality rise without product effort. Pure upside for integrators. But it also reinforces the "the model is the product" thesis — and those integrators become fragile if Anthropic ships a native app builder.

For enterprise CTOs. The "let's wait for the next generation" argument is dead. Every quarter, Anthropic ships an Opus X.Y that makes six-month-old architectures suboptimal. Teams that froze their stack on Opus 4.5 in January are already re-planning.

For OpenAI. Spud / GPT-6 has to match 87.6% SWE-bench or the gap becomes uncatchable in 2026. If GPT-6 ships in May at 85%, Anthropic will have 6 months of coding lead — the most lucrative API segment.

In summary:

Claude Opus 4.7 generally available April 16, 2026, on Claude, API, Bedrock, Vertex AI, Foundry, GitHub Copilot
SWE-bench Verified: 80.8% → 87.6% (+6.8 pts), CursorBench: 58% → 70% (+12 pts), internal bench +13%
Self-verification: the model checks its outputs before shipping them — central capability for long agents
Vision: 2,576 pixels (3.4x previous resolution) — changes OCR, PDF reading, vibe design
Unprecedented admission: Anthropic concedes Mythos (93.9% SWE-bench, 97.6% USAMO) is better but stays in closed preview for 50 organizations
Unchanged pricing: $5/$25 per million input/output tokens, same as Opus 4.6
Gap with GPT-5.4 on SWE-bench reaches 35 points — structural coding dominance

Opus 4.7 is not a breakthrough release. It's a routine shipment from Anthropic — and that's exactly what should worry competitors. Every 6 to 8 weeks for 15 months, Dario Amodei ships a better model, at the same price, without fanfare. The model is the product. The product is superior. And the company keeps an even better model in reserve, commercializing it only to 50 hand-picked organizations. When your competitor publicly announces they have something better than what they're selling you, it's no longer competition — it's a demonstration.

Sources: Anthropic — Introducing Claude Opus 4.7, Axios — Anthropic concedes Opus 4.7 trails Mythos, SiliconANGLE — Claude Opus 4.7 coding visual, GitHub Changelog — Opus 4.7 GA.

#anthropic #claude #opus #swe-bench #vibe-coding #mythos #benchmark #coding-agent

← Back to news

Product

Resources

Anthropic Ships Claude Opus 4.7 — 87.6% on SWE-bench and an Awkward Admission About Mythos

The Internal Benchmark: +13%, Four Previously Impossible Tasks Unlocked

Self-Verification: The Real Novelty

Vision: 3.4x More Pixels

The Mythos Admission: A First

Unchanged Pricing, Pressure on OpenAI

What Developers Get Concretely

What It Changes for the Market

More news

Factory AI Raises $150M at $1.5B Valuation — Droids Ship Code at Nvidia, Adobe and Adyen

OpenAI Brings Out the Heavy Artillery on Codex — Full Mac Control, 90 Plugins, Persistent Memory

Amazon Puts Another $25 Billion Into Anthropic — and Locks In $100 Billion of AWS Compute Over Ten Years