AI2 Releases MolmoWeb: The Open-Source Web Agent That Beats GPT-4o and Gemini at Autonomous Navigation
AI2 launches MolmoWeb, an open-source multimodal web agent with 8B parameters that outperforms GPT-4o and Claude Computer Use on autonomous web navigation benchmarks. Available on Hugging Face.

An 8 billion parameter model. Open-source. That outperforms web agents from OpenAI, Google and Anthropic on autonomous navigation benchmarks. This is MolmoWeb — released tonight on Hugging Face by the Allen Institute for AI (AI2). Autonomous web navigation just became accessible to everyone.
Autonomous Web Navigation: The Hardest Task for Agents
A web agent is an AI model that can use a browser like a human. Read a page visually. Understand the intent. Find the right button. Click. Handle popups, redirects, dynamic states.
It's the hardest task in agentic AI. The agent doesn't read raw HTML — it interprets a screenshot. It must understand layout, visual hierarchy, interactive elements. Then act with precision.
Until now, only three models could do this reliably: OpenAI's GPT-4o, Anthropic's Claude Computer Use, and Google's Gemini. All closed. All proprietary. All paid.
MolmoWeb changes everything.
MolmoWeb in Practice: See + Act
MolmoWeb is a multimodal web agent — a model that combines visual understanding with action capability. "Multimodal" means it processes multiple input types: text and images simultaneously.
How it works is straightforward. Input: a webpage screenshot + a natural language instruction. Output: an action — click on precise coordinates, scroll, text to type in a field. The agent chains these actions in a loop until the task is complete.
Concrete examples:
- "Book a table on OpenTable for 2 people Saturday evening" → the agent navigates the site, selects the restaurant, fills out the form, confirms the reservation.
- "Extract all prices from this e-commerce page" → the agent scrolls the page, reads each product card, returns structured data.
- "Fill out this contact form with these details" → the agent identifies fields, fills them in, clicks Submit.
Two sizes are available: 4B parameters (lightweight, fast) and 8B parameters (performant, the one that beats proprietary models). Parameters are the weights of the neural network — more parameters means more capability, but also more computational resources.
8B Open-Source Parameters > GPT-4o on the Web
The most striking number: MolmoWeb 8B outperforms GPT-4o on autonomous web navigation benchmarks. Benchmarks are standardized tests that measure model performance on specific tasks.
GPT-4o is estimated at roughly 1 trillion parameters. MolmoWeb has 8 billion. That's 125 times smaller — and it wins.
How? Specialized training beats raw scale on specific tasks. It's the same logic as DeepSeek V4 in March 2026: efficiency beats brute force when the model is optimized for a precise task.
| Criteria | GPT-4o (OpenAI) | Claude Computer Use | MolmoWeb 8B (AI2) |
|---|---|---|---|
| Open-source | ❌ | ❌ | ✅ |
| Model size | ~1T params (estimated) | Not disclosed | 8B params |
| Navigation performance | ✅ Excellent | ✅ Excellent | ✅ Superior (benchmarks) |
| Usage cost | 💸 Paid API | 💸 Paid API | 🆓 Free |
| Local deployment | ❌ | ❌ | ✅ Hugging Face |
| Availability | Cloud API only | Cloud API only | Public weights |
The fundamental difference: any developer can download MolmoWeb and run it on their own infrastructure tomorrow. No API key. No subscription. No dependency on a cloud provider.
AI2: The Quiet Anti-OpenAI
AI2 — the Allen Institute for AI — is a nonprofit research organization based in Seattle. Founded by Paul Allen, co-founder of Microsoft, its mission is clear: keep AI research open and accessible, not monopolized by three labs.
Molmo, their series of open-source multimodal models, launched in 2024. The model was already recognized for its image understanding quality. MolmoWeb extends it to action: the model no longer just sees — it acts.
The trajectory is consistent. Where OpenAI closes its models, Google keeps Gemini behind an API, and Anthropic limits Computer Use to paying customers, AI2 publishes full weights on Hugging Face. For free.
The irony: the least covered organization this week may have released the most practically useful model.
The Week of Agents: The Underlying Signal
Figma opened its canvas to agents on Monday. Linear declared issue tracking dead this morning. Google Research published TurboQuant to compress LLM inference. And AI2 releases MolmoWeb tonight.
| Date | Launch | Company | Signal |
|---|---|---|---|
| Mon Mar 23 | Lovable acquisitions | Lovable | Vibe coding consolidation |
| Tue Mar 24 | Cursor × Kimi | Cursor + Moonshot | US-China AI dev stack |
| Tue Mar 24 | Figma Canvas Agents | Figma | Agentic design |
| Wed Mar 25 | Linear Agent | Linear | "Issue tracking is dead" |
| Wed Mar 25 | TurboQuant | Google Research | 6x KV cache memory |
| Wed Mar 25 | MolmoWeb | AI2 | Open-source web agent |
Agents are no longer a speculative future. They're in design tools. In product management. In chips. And now in any browser via a free, open-source model with 8 billion parameters.
The question is no longer "when are agents coming?". The question is "is your stack agent-ready?".
Key Takeaways
- AI2 (Allen Institute for AI) releases MolmoWeb, an open-source multimodal web agent available in 4B and 8B parameter versions on Hugging Face
- MolmoWeb reads webpage screenshots and performs autonomous actions: click, scroll, navigate, fill out forms
- The 8B model outperforms proprietary agents from OpenAI, Google and Anthropic on autonomous web navigation benchmarks
- First competitive open-source model for autonomous web navigation — democratizes RPA (Robotic Process Automation) agents and workflow automation
- Part of a historic week for AI agents: Figma Canvas Agents, Linear Agent, TurboQuant, and now MolmoWeb
MolmoWeb is the quietest and perhaps the most important signal of the week. When an open-source model with 8 billion parameters outperforms the best proprietary agents from the three biggest AI labs at web navigation, the barrier to entry for autonomous agents just fell. Figma opened its canvas to agents. Linear killed issue tracking. And AI2 just put a web agent in the hands of every developer. Agents are no longer a premium feature — they're infrastructure.


