Appearance
Crawling & Batching
Two endpoints when one URL isn't enough.
/batch - Same Schema, Many URLs
Use when you already have the URL list. Returns one /extract response per URL, in order.
json
{
"urls": [
"https://example.com/p/1",
"https://example.com/p/2",
"https://example.com/p/3"
],
"schema": [
{ "field": "title", "type": "string", "example": "Example title" },
{ "field": "price", "type": "float", "example": 19.99 }
],
"options": { "concurrency": 4, "fail_fast": false }
}| Option | Default | Meaning |
|---|---|---|
concurrency | server-tuned | Max parallel fetches inside this batch |
fail_fast | false | If true, return as soon as the first URL errors |
Cost: 1 request per URL. Failures still count.
See POST /batch for the full reference.
/crawl - Follow Links From a Seed
Use when you have a starting URL and a pattern.
json
{
"seed_url": "https://example.com/blog",
"schema": [
{ "field": "title", "type": "string", "example": "Example post" },
{ "field": "publishedAt", "type": "date", "example": "2024-12-20" }
],
"crawl": {
"follow_pattern": "https://example.com/blog/*",
"max_pages": 50,
"max_depth": 2
}
}| Crawl Option | Notes |
|---|---|
follow_pattern | Only links matching this are followed |
max_pages | Hard ceiling on pages visited |
max_depth | Hops from seed_url |
allow_large_crawl | Bypass the 25%-of-quota safety cap (see below) |
The crawler:
- Respects
robots.txt. - Reads
sitemap.xmlwhen present. - Applies per-host jitter and adaptive back-off so you don't get banned.
- Returns an array of per-page results plus
crawl_meta(pages visited / skipped / failed).
Quota safety: max_pages is silently capped at floor(remaining_monthly_quota × 0.25) upfront. The unused units are refunded immediately. Pass "allow_large_crawl": true to remove the cap.
See POST /crawl for the full reference.
Picking Between Them
| Scenario | Use |
|---|---|
| You have a known list of URLs | /batch |
| You have one URL and want to discover related pages | /crawl |
You're paginating a single feed (?page=1..N) | Build the URLs and use /batch - cheaper and simpler |
| Site is JS-heavy and link graph is hard to predict | /crawl with render_js: "always" |
Example: Blog Archive
bash
curl -X POST https://api.scrapewithruno.com/v1/crawl \
-H "X-API-Key: $RUNO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"seed_url": "https://example.com/blog",
"schema": [
{ "field": "title", "type": "string", "example": "Example post" },
{ "field": "publishedAt", "type": "date", "example": "2024-12-20" }
],
"crawl": {
"follow_pattern": "https://example.com/blog/*",
"max_pages": 25,
"max_depth": 2
}
}'Returns:
json
{
"seed_url": "https://example.com/blog",
"results": [
{ "url": "https://example.com/blog/post-1", "status": "success", "data": { "title": "...", "publishedAt": "2024-11-01" } },
{ "url": "https://example.com/blog/post-2", "status": "success", "data": { "title": "...", "publishedAt": "2024-11-08" } }
],
"crawl_meta": {
"pages_visited": 17,
"pages_skipped": 3,
"pages_failed": 0,
"cancelled": false
}
}