AIApril 15, 2026 · 09:30 AM6 min readBy Paul Lefizelier

Meta Launches Llama 4 Scout and Maverick Open-Weight: Native MoE, Multimodal, 10 Million Token Context

On April 5, 2026, Meta released Llama 4 Scout and Maverick — the first Llama models built on Mixture-of-Experts with native multimodality. Scout: 17B active params, 16 experts, 10M context. Maverick: 17B active, 128 experts, beats GPT-4o.

Summarize with AI ChatGPT Claude Perplexity Gemini

Meta Launches Llama 4 Scout and Maverick Open-Weight: Native MoE, Multimodal, 10 Million Token Context

On April 5, 2026, Meta shipped Llama 4 Scout and Llama 4 Maverick — the first two models of a new family built from the ground up as natively multimodal on a Mixture-of-Experts architecture. Scout ships with a 10 million token context — the largest ever released open-weight. Maverick beats GPT-4o and Gemini 2.0 Flash on most mainstream benchmarks while activating only 17 billion parameters per forward pass. Both are downloadable on Hugging Face and llama.com. It's Meta's comeback after the perceived miss of Llama 3.3 and the $14 billion poured into acquiring Alexandr Wang.

Two Models, One MoE Architecture

Llama 4 closes the book on dense models. Scout and Maverick are both built around a Mixture-of-Experts (MoE) — a technique where multiple specialized "experts" share the workload and only a subset is activated per token. That's what allows giant models (400 billion total parameters for Maverick) to infer at the cost of a 17 billion parameter model.

Model	Active Params	Experts	Total Params	Context
Llama 4 Scout	17B	16	~109B	10M tokens
Llama 4 Maverick	17B	128	400B	1M tokens

Scout is positioned as the efficient model — small on active compute, enormous on context capacity. Ten million tokens is roughly 7,500 pages. An entire codebase. A complete book with every reference. A full legal corpus for a contract. It's the model designed for agentic workflows that spend their lives juggling heavy knowledge bases.

Maverick is positioned as the performant model. 128 experts, 400 billion total parameters, benchmarks beating GPT-4o and Gemini 2.0 Flash while running at roughly 60% of DeepSeek v3's compute cost.

Native Multimodality, Not Bolt-On

This is what distinguishes Llama 4 from earlier generations. Text, images, and video are processed by the same layers — not through a visual encoder attached after the fact. Meta calls it "early fusion": text tokens and visual tokens share the same embedding space from the first transformer layer.

The consequences are practical. A question about a video frame can reference text in the same inference without a round-trip. A visual agent can reason on a screenshot and a log in parallel. The model can generate image descriptions consistent with the tone of an upstream document.

For developers building AI agents, the difference is massive: no more chaining GPT-4o vision + GPT-4o text. Maverick does both in the same pass, with the same contextual coherence.

Benchmarks: Maverick Beats GPT-4o, Scout Crushes Its Tier

The numbers published by Meta (to be taken with the usual caveat for self-reported benchmarks) show Maverick ahead on MMLU, MATH, HumanEval and ChartQA versus GPT-4o and Gemini 2.0 Flash. On MATH, Maverick hits 78.5% versus 76.6% for GPT-4o. On HumanEval (coding), 83% versus 80%.

Benchmark	Llama 4 Maverick	GPT-4o	Gemini 2.0 Flash
MMLU	85.2%	85.7%	83.9%
MATH	78.5%	76.6%	76.8%
HumanEval	83.0%	80.0%	79.5%
ChartQA	90.0%	85.7%	85.5%

Scout, in its tier (sub-20B active), crushes Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1 on every public benchmark. The 10M context is verified on needle-in-a-haystack tasks up to 10 million tokens.

Keep in mind that Maverick's weights total 400 billion parameters. Even open-weight, it's hard to run — you need 8 H200s (512 GB VRAM) for BF16 inference, or aggressive quantization to drop to 4 GPUs. Scout fits on 2 H100s in BF16 — much more accessible.

What's Next: Behemoth, the Real Monster

Scout and Maverick are just the first two releases. Meta confirmed Llama 4 Behemoth — a 2 trillion total parameter model, still training, that will serve as the "teacher model" for distilling smaller models. Announcement scheduled for LlamaCon on April 29.

If Behemoth delivers, it will be the largest open-weight model ever published — surpassing DeepSeek V4 1T that dominated a few weeks ago. Behemoth's arrival is what's keeping OpenAI awake: a frontier model, open-weight, downloadable, with no commercial restriction below 700 million users (Meta's usual license threshold).

The Context: $14B to Alexandr Wang, Pressure on Zuckerberg

Llama 4 is the first major model since Meta spent $14 billion to acquire Scale AI and hire its CEO Alexandr Wang as Chief AI Officer. Pressure on the outcome was maximum. CNBC noted that the market was watching this release as a verdict on the Wang investment.

Early reactions are mixed. Benchmarks are solid. Scout's 10M context is impressive. But Maverick doesn't beat Claude Sonnet 4.6 or GPT-4.5 on complex reasoning tasks. And the fact that Meta is holding Behemoth for LlamaCon on April 29 suggests the real frontier response is still to come.

Fact	Data
Release date	April 5, 2026
License	Llama 4 Community License (commercial OK < 700M MAU)
Available models	Scout (17B active, 10M context) + Maverick (17B active, 128 experts, 400B total)
Coming	Behemoth (2T params) — announcement at LlamaCon April 29
Meta 2026 AI CapEx	$60-65 billion
Scale AI + Alexandr Wang acquisition	$14 billion

Why It Matters for the Open-Weight Ecosystem

For sovereignty-conscious enterprises. A GPT-4o-grade model in open-weight changes the game. A bank can now run Maverick on its private infrastructure without sharing any data with OpenAI or Google.

For research. A 10M open-weight context enables experiments on long-range workflows that only Gemini 2.5 Pro and Claude Mythos allowed — and those are closed.

For agents. AI agent frameworks (LangChain, CrewAI, autogen) had no open-weight model capable of sustaining a long workflow without losing context. Scout unlocks that.

For inference economics. Native MoE reduces inference cost at equivalent total parameters. Inference providers (Together AI, Groq, Fireworks) can serve Maverick at a price close to GPT-4o Mini while offering GPT-4o quality.

What's Missing

Not everything shines. No declared reasoning model — no o1 or Claude Mythos equivalent for extended reasoning. No generative video — Llama 4 ingests video but doesn't generate it. Self-reported benchmarks — LMArena and independent evaluations still need to validate. MoE latency — inactive experts create branching that complicates batching and latency under load.

TL;DR:

Meta released Llama 4 Scout and Maverick on April 5, 2026 open-weight on Hugging Face and llama.com
Native Mixture-of-Experts architecture: Scout (17B active, 16 experts, 10M token context), Maverick (17B active, 128 experts, 400B total params)
Multimodal by design: text + image + video on the same layers with early fusion
Maverick beats GPT-4o and Gemini 2.0 Flash on MATH (78.5%), HumanEval (83%), ChartQA (90%)
Llama 4 Behemoth (2 trillion params) announced for LlamaCon April 29
First major model since the $14 billion Scale AI / Alexandr Wang acquisition

Llama 4 puts Meta back in the frontier race many saw them exiting after Llama 3.3. The architectural choice sends a signal: native MoE + native multimodality + massive context is the exact template DeepSeek and Mistral are following. Frontier-grade open-weight is becoming the standard — not the exception. What's still to prove is whether Behemoth on April 29 justifies the $14 billion invested in Alexandr Wang. If it does, the pressure on OpenAI and Anthropic becomes existential: why pay $20/1M tokens for Claude Opus when you can run a comparable model in-house? The answer will become the main story of H2 2026.

Sources: Meta AI — The Llama 4 herd, Hugging Face — Llama 4 release, IBM — watsonx.ai availability, CNBC — first major model since Wang.

#meta #llama-4 #llama-scout #llama-maverick #open-weight #mixture-of-experts #multimodal #mark-zuckerberg #alexandr-wang

← Back to news

Product

Resources

Meta Launches Llama 4 Scout and Maverick Open-Weight: Native MoE, Multimodal, 10 Million Token Context

Two Models, One MoE Architecture

Native Multimodality, Not Bolt-On

Benchmarks: Maverick Beats GPT-4o, Scout Crushes Its Tier

What's Next: Behemoth, the Real Monster

The Context: $14B to Alexandr Wang, Pressure on Zuckerberg

Why It Matters for the Open-Weight Ecosystem

What's Missing

More news

OpenAI Launches GPT-5.4-Cyber — The First Frontier Model With a Lowered Refusal Boundary for Defensive Cybersecurity, Restricted to Trusted Access Tiers

Emergent Launches Wingman — The Messaging-First AI Agent That Turns an Indian Vibe Coding Startup Into a Direct Rival to Claude and OpenClaw

Perplexity Launches Billion Dollar Build — $1M Seed Funding to Build a Unicorn in 8 Weeks With Perplexity Computer