Skip to content

Image Augmentation

Some pages put the data you want inside images. E.g. price tag overlay, a stat block on a poster, a number stamped on a product photo. Runo can read it.

What It Does

Set options.process_images: true on /extract (or any batch/crawl request). After the normal text extraction, if any of your fields came back null, Runo:

  1. Scores the page's <img> candidates against the null field names (alt text + filename token overlap, with a same-origin boost; trackers and decorative thumbnails are skipped).
  2. Fetches up to 3 top candidates concurrently (5 s timeout each, ≤ 5 MB).
  3. Sends them to LLM in a single multimodal call targeting only the still-null fields.
  4. Merges any newly-extracted values into the response.
  • It only fires when there are nulls left to fill.
  • It only sends the candidate images, not the whole page.
  • It only asks to fill the missing fields.

If the image pass fails (timeout, fetch error, model error), the original text-only result is returned unchanged.

Plan Availability

process_images is Scale-tier only. Lower tiers ignore the flag.

Example

bash
curl -X POST https://api.scrapewithruno.com/v1/extract \
  -H "X-API-Key: $RUNO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/listings/abc",
    "schema": [
      { "field": "title",  "type": "string", "example": "Example" },
      { "field": "price",  "type": "float",  "example": 19.99 },
      { "field": "rating", "type": "float",  "example": 4.6 }
    ],
    "options": { "process_images": true }
  }'

The response includes images_processed:

json
{
  "url": "...",
  "status": "success",
  "render_mode": "headless",
  "data": { "title": "Vintage Lamp", "price": 49.00, "rating": 4.8 },
  "images_processed": 2
}

images_processed is null when the pass didn't run (no nulls remained, lower tier, flag off, or the page had no scoring candidates).

When It Pays Off

Best on:

  • Listings with stats baked into the cover image (job ads, real estate, marketplace cards).
  • Product photos with prices/discounts overlaid.
  • Posters with running times, ratings, or release dates.

Less useful on text-heavy pages where the data is already in the DOM.

Released under the terms of Runo’s Terms of Use.