AI7 min readBy Paul Lefizelier

Google Gemma 4: AIME 20% → 89%, Codeforces 110 → 2150, Apache 2.0 — The Leap That Redefines Open-Source Models

Google DeepMind launches Gemma 4: 4 Apache 2.0 open-weight models, AIME 89%, Codeforces ELO 2150, native function calling, and full offline on smartphones. Frontier AI goes open-source.

Google Gemma 4: AIME 20% → 89%, Codeforces 110 → 2150, Apache 2.0 — The Leap That Redefines Open-Source Models

AIME 2026: from 20% to 89% in a single generation. Codeforces ELO: from 110 to 2150. Apache 2.0, native function calling, full offline on smartphones. Gemma 4 just dropped — and this is no longer a "decent open-source model." This is a frontier model. Four variants, available now on Hugging Face.

AIME 89%, Codeforces 2150: Why These Numbers Change Everything

The AIME (American Invitational Mathematics Examination) is the competition that selects top US students for the International Mathematical Olympiad. It represents world-elite high school mathematics — a single AIME problem takes a talented student several hours to solve.

Gemma 3 27B solved 1 in 5. Gemma 4 31B solves 9 in 10. This is not a marginal improvement — it's a category change.

Codeforces ELO measures skill in competitive programming on a universal scale: 800 = beginner, 1600 = Expert, 2100 = Master, 2400 = Grandmaster. Gemma 3 was at 110 — essentially zero. Gemma 4 31B sits at 2150 — Master level, top 1% of competitive programmers worldwide.

Until now, only closed frontier models reached this territory: OpenAI's o3, Gemini 2.5 Pro, Claude Opus 4. All proprietary, all paid. Gemma 4 delivers this under Apache 2.0 as an open-source model. That's a structural break.

The LMArena is the global ranking of AI models based on human preference evaluations. Gemma 4 31B ranks #3 worldwide open-source at 1452 points. The 26B MoE is #6 worldwide open-source at 1441 points. Google's own words: "outcompetes models 20x its size."

MoE: 4B Compute Cost for 26B-Level Performance

MoE (Mixture of Experts) is an architecture that changes how a model uses its parameters. Rather than activating all parameters for every token, the model routes each input to the most relevant "experts" — specialized sub-networks within the larger model.

The Gemma 4 26B MoE has 26 billion total parameters but only activates 4 billion during inference. Result: the compute cost of a 4B model, with the performance of a 26B model. LMArena 1441, ranked #6 open-source worldwide.

This is the same efficiency-over-brute-force principle that Google Research applied with TurboQuant for LLM inference compression. With MoE, you can run 26B-class intelligence on hardware that would normally cap out at 4B. For developers working with constrained compute — whether on cloud budgets or on-device — this is a decisive advantage.

ModelActive ParamsContextTargetLMArena
E2B2B128KMobile, offline
E4B4B128KEdge, Android
26B MoE4B active / 26B total256KServer1441 (#6)
31B Dense31B256KCloud, workstation1452 (#3)

Apache 2.0: The License Decision That Unlocks Everything

Previous Gemma versions shipped under a custom license with commercial usage restrictions. Some use cases required prior approval from Google. That's over.

Apache 2.0 is maximum permissiveness in open-source: free commercial use, free modification, free redistribution — no restrictions, no royalties, no scale limits. Any startup can integrate Gemma 4 into a commercial product starting this morning, with no permission required, no license to pay, no ceiling on deployment.

The signal of the week: both Meta (Llama 4) and Google (Gemma 4) are choosing full permissiveness. OpenAI keeps its models closed. The open vs. closed battle in 2026 is tilting clearly toward open. 400 million cumulative downloads across all Gemma versions — the community already existed. It just gained commercial freedom.

BenchmarkGemma 3 27BGemma 4 31BDelta
AIME 202620.8%89.2%+68 pts
Codeforces ELO1102150+2040
MMLU Pro85.2%
LMArena score1452#3 worldwide

Offline on Smartphones: AI Without the Cloud

E2B and E4B run 100% offline. No internet connection. No remote server. No API call. The model weights live on the device — inference runs entirely on-device.

Target hardware: Android smartphones (Pixel, Qualcomm Snapdragon, MediaTek), Raspberry Pi, NVIDIA Jetson Orin Nano. The audio encoder was compressed from 681 million to 305 million parameters — reducing audio transcription latency from 160ms to 40ms. Real-time transcription on a phone, without a connection, is now realistic.

Edge AI refers to this capacity to run AI models directly on the device (at the "edge" of the network), without routing through the cloud. This is a fundamental architectural shift: generative multimodal AI leaves the datacenter and enters everyone's pocket.

For Android developers: AICore Developer Preview is available today. AICore is Android's native AI inference runtime, optimizing model execution across recent mobile chips. It's the standard path for integrating Gemma 4 into a production Android application.

Native Function Calling: Gemma 4 Inside AI Agents

Function calling enables an AI model to invoke external functions or APIs directly, without intermediary code or prompt engineering workarounds. It is the foundational capability of autonomous AI agents.

This is a first for the Gemma family. Previous versions required fine-tuning or complex prompt engineering to call tools reliably. Gemma 4 does it natively, with standard OpenAI function calling compatibility — meaning tools already built for GPT-4 work directly with Gemma 4, no changes required.

Combined with the MCP (Model Context Protocol) standard that is emerging as agent infrastructure this week — EmDash adopted it for its open-source CMS — Gemma 4 becomes a viable reasoning core for any open-source agent. No fine-tuning. No proprietary infrastructure. Apache 2.0 for commercial redistribution. Open-source agent frameworks like Meta HyperAgents gain a reasoning model that matches the task.

Native system prompt support is included — enabling the system role for structured conversations, which is essential for production agents where behavior must be reliably constrained. Together, function calling + system prompt + MoE efficiency + Apache 2.0 = a production-ready open-source agent brain.

Gemma 4 vs Open-Source Competitors — April 2026

ModelLabActive ParamsAIME 2026LicenseFunction calling
Gemma 4 31BGoogle31B89.2%✅ Apache 2.0✅ Native
Gemma 4 26B MoEGoogle4B active88.3%✅ Apache 2.0✅ Native
Llama 4 ScoutMeta17B active~70%✅ Llama 4
Qwen 3.5 27BAlibaba27B~75%Apache 2.0
DeepSeek-R2DeepSeek~85%MIT

Llama 4 Scout has a larger context window (10M tokens vs 256K for Gemma 4 31B), making it the better choice for very long documents. On mathematical reasoning and competitive programming, Gemma 4 leads. On MMLU Pro (85.2%), it outperforms Qwen 3.5 27B. On Codeforces, no other open-source competitor approaches 2150 ELO.


Gemma 4 — Key Takeaways

  • Google DeepMind launches Gemma 4 on April 1, 2026: 4 open-weight models (E2B, E4B, 26B MoE, 31B) under Apache 2.0 — a first for the Gemma family
  • Record performance: AIME 2026 89.2%, Codeforces 2150 ELO, LMArena 1452 (#3 worldwide open-source) — frontier performance in open-source
  • MoE architecture: 26B total, 4B active — 26B-level performance at 4B compute cost
  • Native multimodal, native function calling, agentic workflows first — the first Gemma model designed for production AI agents
  • Edge AI: E2B and E4B run 100% offline on Android, Raspberry Pi, NVIDIA Jetson Orin Nano — real-time audio transcription at 40ms latency

Three years ago, "open-source" and "frontier" were antonyms in AI. The best models were closed by definition. Open-source was "decent but not competitive." Gemma 4 solves AIME at 89%. DeepSeek began this convergence in January 2025. Llama 4 and Gemma 4 confirm it in April 2026. The divide between open and closed is no longer a performance divide — it's a strategic choice. And more labs are choosing openness. OpenAI, with Spud, is increasingly alone in the opposite direction. What AI2's MolmoWeb did for autonomous web navigation, Gemma 4 does for reasoning and agents: frontier performance, open-source, available today.

Available on Hugging Face, Kaggle, Ollama, Google AI Studio and the Gemini API. See the Google official announcement and the official technical model card for full technical details. 140 languages supported, 256K context on the server models.

#google #gemma-4 #open-source #apache-2-0 #google-deepmind #moe #multimodal #edge-ai #function-calling #ai-agents