Cloudflare Launches /crawl: Crawl an Entire Website with a Single API Request
Cloudflare adds a /crawl endpoint to its Browser Rendering service. Developers and RAG pipelines can now explore a complete website and extract content in HTML, Markdown, or structured JSON via a simple API request.

Cloudflare has just enriched its Browser Rendering service with a new /crawl endpoint, available now in open beta on Workers Free and Paid plans. The goal: let any developer explore a complete website — JavaScript included — and extract content in the format of their choice, via a single API request.
One API Call, an Entire Website Explored
The concept is straightforward. The developer sends a starting URL to the /crawl endpoint, and Browser Rendering handles the rest: it follows links and sitemaps, loads each page in a real browser (with JavaScript execution), then returns the content in HTML, Markdown, or structured JSON — the latter generated via Cloudflare's embedded AI models.
Exploration runs in the background. The API immediately returns a crawl identifier, which you then query to retrieve results as processing progresses. Several parameters allow fine-tuning the scope:
- Crawl depth and maximum number of pages
- URL pattern filters to include or exclude certain paths
- Incremental crawl to skip pages unchanged since the last exploration
- Static mode: raw HTML retrieval without JavaScript, faster for static sites
- Respect for robots.txt directives, including request delay
This announcement follows Markdown for Agents, launched a few weeks ago, which automatically converted HTML to Markdown for AI agents. The /crawl goes further: it automates the entire content ingestion pipeline.
The Primary Use Case: Feeding RAG Pipelines
Cloudflare explicitly targets developers building AI applications — particularly RAG (Retrieval-Augmented Generation) pipelines, which need to regularly index web content to enrich language model responses.
Until now, this type of workflow required configuring third-party tools (Scrapy, Puppeteer, Playwright), managing browser instances, and manually handling pagination and JavaScript. With /crawl, all this work is delegated to Cloudflare's infrastructure. It's a significant simplification for developers connecting content sources to their AI assistants via the MCP protocol or other agent orchestrators.
Other highlighted use cases: model training, site-wide content monitoring, and automated competitive intelligence.
Cloudflare, Arbiter Between Publishers and AI
This announcement reveals a tension at the heart of Cloudflare's strategy. On one side, the company has been developing publisher protection tools for several months: AI Labyrinth (which traps AI crawlers in generated pages), the Pay per Crawl model launched with Stack Overflow, and default blocking of AI crawlers on new domains. On the other, it now provides developers with the means to crawl the web at scale.
This central intermediary position isn't a paradox: it's a business model. By positioning itself between publishers and AI systems, Cloudflare — which powers roughly 20% of the global web — aims to become the reference infrastructure for content exchanges between humans and machines.
The question of who controls access to web content is at the heart of current tensions, as illustrated by the court ruling against Perplexity's Comet agent, which recently set a first precedent on autonomous shopping agents.
/crawl Endpoint at a Glance
| Parameter | Functionality |
|---|---|
startUrl | Starting URL for the crawl |
maxDepth | Maximum navigation depth |
maxPages | Maximum number of pages explored |
outputFormat | html, markdown, or json |
incremental | Skip unchanged pages |
respectRobotsTxt | Respect robots.txt directives |
staticMode | Raw HTML without JavaScript |
The /crawl endpoint is available now in open beta on Workers Free and Paid. Documentation is accessible on the Cloudflare developer portal.


