Skip to content

POST /crawl

Follow links from a seed URL and extract on each page.

POST https://api.scrapewithruno.com/v1/crawl
X-API-Key: sk_live_...
Content-Type: application/json

Request Body

json
{
  "seed_url": "https://example.com/blog",
  "schema": [
    { "field": "title",       "type": "string", "example": "Example post" },
    { "field": "publishedAt", "type": "date",   "example": "2024-12-20" }
  ],
  "crawl": {
    "follow_pattern": "https://example.com/blog/*",
    "max_pages": 50,
    "max_depth": 2,
    "allow_large_crawl": false
  },
  "options": { "render_js": "auto", "timeout_ms": 15000 },
  "async_mode": false
}
FieldTypeRequiredNotes
seed_urlstringyesWhere to start
schemaarraydynamic keys onlyApplied to every visited page
crawl.follow_patternstringyesGlob; only matching links are followed
crawl.max_pagesintegeryesHard ceiling on pages visited
crawl.max_depthintegernoHops from the seed; default 2
crawl.allow_large_crawlbooleannoBypass the 25%-of-quota cap (see below)
options.*-noSame as /extract, applied per page
async_modebooleannoRoute LLM extraction through Batch API (up to 24h)

Quota Safety Cap

max_pages is silently capped at floor(remaining_monthly_quota × 0.25) at the start of the call. Excess units are refunded immediately. Set crawl.allow_large_crawl: true to remove the cap and reserve the full requested amount.

Behavior

  • Respects robots.txt.
  • Reads sitemap.xml when present.
  • Per-host jitter and adaptive back-off so the target site doesn't ban you.

Response

json
{
  "seed_url": "https://example.com/blog",
  "results": [
    { "url": "https://example.com/blog/post-1", "status": "success", "render_mode": "fetch", "data": { "title": "...", "publishedAt": "2024-11-01" } },
    { "url": "https://example.com/blog/post-2", "status": "success", "render_mode": "fetch", "data": { "title": "...", "publishedAt": "2024-11-08" } }
  ],
  "crawl_meta": {
    "pages_visited": 17,
    "pages_skipped": 3,
    "pages_failed": 0,
    "cancelled": false
  }
}

Cost

max_pages units are reserved at the start. Whatever isn't visited is refunded to your monthly counter (and to your Pro/Scale overage credit balance if applicable).

Examples

bash
curl -X POST https://api.scrapewithruno.com/v1/crawl \
  -H "X-API-Key: $RUNO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "seed_url": "https://example.com/blog",
    "schema": [
      { "field": "title",       "type": "string", "example": "Example post" },
      { "field": "publishedAt", "type": "date",   "example": "2024-12-20" }
    ],
    "crawl": {
      "follow_pattern": "https://example.com/blog/*",
      "max_pages": 25,
      "max_depth": 2
    }
  }'
js
const res = await fetch('https://api.scrapewithruno.com/v1/crawl', {
  method: 'POST',
  headers: {
    'X-API-Key': process.env.RUNO_API_KEY,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    seed_url: 'https://example.com/blog',
    schema: [
      { field: 'title',       type: 'string', example: 'Example post' },
      { field: 'publishedAt', type: 'date',   example: '2024-12-20' },
    ],
    crawl: {
      follow_pattern: 'https://example.com/blog/*',
      max_pages: 25,
      max_depth: 2,
    },
  }),
})
const { results, crawl_meta } = await res.json()
console.log(crawl_meta, results.length)
py
import os, requests

res = requests.post(
    "https://api.scrapewithruno.com/v1/crawl",
    headers={"X-API-Key": os.environ["RUNO_API_KEY"]},
    json={
        "seed_url": "https://example.com/blog",
        "schema": [
            {"field": "title",       "type": "string", "example": "Example post"},
            {"field": "publishedAt", "type": "date",   "example": "2024-12-20"},
        ],
        "crawl": {
            "follow_pattern": "https://example.com/blog/*",
            "max_pages": 25,
            "max_depth": 2,
        },
    },
    timeout=600,
)
body = res.json()
print(body["crawl_meta"])
print(len(body["results"]), "pages")

Released under the terms of Runo’s Terms of Use.