IA11 de marzo de 2026 · 15:183 min de lecturaPor Paul Lefizelier

Cloudflare Launches /crawl: Crawl an Entire Website with a Single API Request

Cloudflare adds a /crawl endpoint to its Browser Rendering service. Developers and RAG pipelines can now explore a complete website and extract content in HTML, Markdown, or structured JSON via a simple API request.

Resumir con la IA ChatGPT Claude Perplexity Gemini

Cloudflare Launches /crawl: Crawl an Entire Website with a Single API Request

Cloudflare has just enriched its Browser Rendering service with a new /crawl endpoint, available now in open beta on Workers Free and Paid plans. The goal: let any developer explore a complete website — JavaScript included — and extract content in the format of their choice, via a single API request.

One API Call, an Entire Website Explored

The concept is straightforward. The developer sends a starting URL to the /crawl endpoint, and Browser Rendering handles the rest: it follows links and sitemaps, loads each page in a real browser (with JavaScript execution), then returns the content in HTML, Markdown, or structured JSON — the latter generated via Cloudflare's embedded AI models.

Exploration runs in the background. The API immediately returns a crawl identifier, which you then query to retrieve results as processing progresses. Several parameters allow fine-tuning the scope:

Crawl depth and maximum number of pages
URL pattern filters to include or exclude certain paths
Incremental crawl to skip pages unchanged since the last exploration
Static mode: raw HTML retrieval without JavaScript, faster for static sites
Respect for robots.txt directives, including request delay

This announcement follows Markdown for Agents, launched a few weeks ago, which automatically converted HTML to Markdown for AI agents. The /crawl goes further: it automates the entire content ingestion pipeline.

The Primary Use Case: Feeding RAG Pipelines

Cloudflare explicitly targets developers building AI applications — particularly RAG (Retrieval-Augmented Generation) pipelines, which need to regularly index web content to enrich language model responses.

Until now, this type of workflow required configuring third-party tools (Scrapy, Puppeteer, Playwright), managing browser instances, and manually handling pagination and JavaScript. With /crawl, all this work is delegated to Cloudflare's infrastructure. It's a significant simplification for developers connecting content sources to their AI assistants via the MCP protocol or other agent orchestrators.

Other highlighted use cases: model training, site-wide content monitoring, and automated competitive intelligence.

Cloudflare, Arbiter Between Publishers and AI

This announcement reveals a tension at the heart of Cloudflare's strategy. On one side, the company has been developing publisher protection tools for several months: AI Labyrinth (which traps AI crawlers in generated pages), the Pay per Crawl model launched with Stack Overflow, and default blocking of AI crawlers on new domains. On the other, it now provides developers with the means to crawl the web at scale.

This central intermediary position isn't a paradox: it's a business model. By positioning itself between publishers and AI systems, Cloudflare — which powers roughly 20% of the global web — aims to become the reference infrastructure for content exchanges between humans and machines.

The question of who controls access to web content is at the heart of current tensions, as illustrated by the court ruling against Perplexity's Comet agent, which recently set a first precedent on autonomous shopping agents.

/crawl Endpoint at a Glance

Parameter	Functionality
`startUrl`	Starting URL for the crawl
`maxDepth`	Maximum navigation depth
`maxPages`	Maximum number of pages explored
`outputFormat`	`html`, `markdown`, or `json`
`incremental`	Skip unchanged pages
`respectRobotsTxt`	Respect robots.txt directives
`staticMode`	Raw HTML without JavaScript

The /crawl endpoint is available now in open beta on Workers Free and Paid. Documentation is accessible on the Cloudflare developer portal.

#cloudflare #browser-rendering #crawl #api #rag #ai-agents #scraping #workers

← Volver a noticias

Producto

Recursos

Cloudflare Launches /crawl: Crawl an Entire Website with a Single API Request

One API Call, an Entire Website Explored

The Primary Use Case: Feeding RAG Pipelines

Cloudflare, Arbiter Between Publishers and AI

Más noticias

Microsoft Launches Agent Governance Toolkit: 7 Open-Source Packages to Secure AI Agents — Kill Switch, EU AI Act, Cryptographic Identity

Google Gemma 4: AIME 20% → 89%, Codeforces 110 → 2150, Apache 2.0 — The Leap That Redefines Open-Source Models

Cloudflare Launches EmDash: The Open-Source WordPress Successor Built for AI Agents — Native MCP, Sandboxed Plugins, TypeScript