POST /crawl

Follow links from a seed URL and extract on each page.

POST https://api.scrapewithruno.com/v1/crawl
X-API-Key: sk_live_...
Content-Type: application/json

Request Body

json

{
  "seed_url": "https://example.com/blog",
  "schema": [
    { "field": "title",       "type": "string", "example": "Example post" },
    { "field": "publishedAt", "type": "date",   "example": "2024-12-20" }
  ],
  "crawl": {
    "follow_pattern": "https://example.com/blog/*",
    "max_pages": 50,
    "max_depth": 2,
    "allow_large_crawl": false
  },
  "options": { "render_js": "auto", "timeout_ms": 15000 },
  "async_mode": false
}

Field	Type	Required	Notes
`seed_url`	string	yes	Where to start
`schema`	array	dynamic keys only	Applied to every visited page
`crawl.follow_pattern`	string	yes	Glob; only matching links are followed
`crawl.max_pages`	integer	yes	Hard ceiling on pages visited
`crawl.max_depth`	integer	no	Hops from the seed; default 2
`crawl.allow_large_crawl`	boolean	no	Bypass the 25%-of-quota cap (see below)
`options.*`	-	no	Same as `/extract`, applied per page
`async_mode`	boolean	no	Route LLM extraction through Batch API (up to 24h)

Quota Safety Cap

max_pages is silently capped at floor(remaining_monthly_quota × 0.25) at the start of the call. Excess units are refunded immediately. Set crawl.allow_large_crawl: true to remove the cap and reserve the full requested amount.

Behavior

Respects robots.txt.
Reads sitemap.xml when present.
Per-host jitter and adaptive back-off so the target site doesn't ban you.

Response

json

{
  "seed_url": "https://example.com/blog",
  "results": [
    { "url": "https://example.com/blog/post-1", "status": "success", "render_mode": "fetch", "data": { "title": "...", "publishedAt": "2024-11-01" } },
    { "url": "https://example.com/blog/post-2", "status": "success", "render_mode": "fetch", "data": { "title": "...", "publishedAt": "2024-11-08" } }
  ],
  "crawl_meta": {
    "pages_visited": 17,
    "pages_skipped": 3,
    "pages_failed": 0,
    "cancelled": false
  }
}

Cost

max_pages units are reserved at the start. Whatever isn't visited is refunded to your monthly counter (and to your Pro/Scale overage credit balance if applicable).

Examples

curlJavaScriptPython

bash

curl -X POST https://api.scrapewithruno.com/v1/crawl \
  -H "X-API-Key: $RUNO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "seed_url": "https://example.com/blog",
    "schema": [
      { "field": "title",       "type": "string", "example": "Example post" },
      { "field": "publishedAt", "type": "date",   "example": "2024-12-20" }
    ],
    "crawl": {
      "follow_pattern": "https://example.com/blog/*",
      "max_pages": 25,
      "max_depth": 2
    }
  }'

const res = await fetch('https://api.scrapewithruno.com/v1/crawl', {
  method: 'POST',
  headers: {
    'X-API-Key': process.env.RUNO_API_KEY,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    seed_url: 'https://example.com/blog',
    schema: [
      { field: 'title',       type: 'string', example: 'Example post' },
      { field: 'publishedAt', type: 'date',   example: '2024-12-20' },
    ],
    crawl: {
      follow_pattern: 'https://example.com/blog/*',
      max_pages: 25,
      max_depth: 2,
    },
  }),
})
const { results, crawl_meta } = await res.json()
console.log(crawl_meta, results.length)

import os, requests

res = requests.post(
    "https://api.scrapewithruno.com/v1/crawl",
    headers={"X-API-Key": os.environ["RUNO_API_KEY"]},
    json={
        "seed_url": "https://example.com/blog",
        "schema": [
            {"field": "title",       "type": "string", "example": "Example post"},
            {"field": "publishedAt", "type": "date",   "example": "2024-12-20"},
        ],
        "crawl": {
            "follow_pattern": "https://example.com/blog/*",
            "max_pages": 25,
            "max_depth": 2,
        },
    },
    timeout=600,
)
body = res.json()
print(body["crawl_meta"])
print(len(body["results"]), "pages")

POST /crawl ​

Request Body ​

Quota Safety Cap ​

Behavior ​

Response ​

Cost ​