Table of Content

Table of Content

Table of Content

Scrape web content and forget about managing scraping infrastructure

Fetch content from any URL at scale and in real-time. Return the HTML or Markdown format with optional link extraction for crawling applications.

Blat logo
Blat logo

Introduction

Scraping the web at scale is a complex endeavor. Managing proxies, browsers, and countering anti-scraping measures can be daunting. While diving deep into web scraping can be enlightening, time constraints might necessitate a more straightforward solution. Enter Blat's /scrape and /scrape_sitemap endpoints, which are designed to simplify large-scale web scraping.

Blat's APIs are crafted with simplicity, quality, and affordability in mind. The /scrape endpoint is engineered to function seamlessly, ensuring top-notch results at competitive prices. Blat is committed to passing on cost savings to users without compromising on quality. This means you can focus on extracting valuable data without the hassle of intricate configurations.

Scrape web content with /scrape endpoint

The /scrape endpoint fetches content from a specified URL and returns it in either HTML or Markdown format. It also offers optional link extraction.

Key Features:

  • Format Selection: Choose between html or markdown for the output format.

  • Link Extraction: Optionally extract links from the content.

  • External and Subdomain Links: Control the inclusion of external and subdomain links.

Sample Request:

curl --request POST \
  --url https://api.blat.ai/scrape \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "url": "https://example.com",
  "format": "html",
  "extract_links": true,
  "allow_external_links": false,
  "allow_subdomain_links": false
}'

Response Structure:

{
  "content": "<html>...</html>",
  "extracted_links": [
    "https://example.com/link1",
    "https://example.com/link2"
  ]
}

For comprehensive details, refer to the Scrape Endpoint Documentation.

Download sitemap from any website

The /scrape_sitemap endpoint is tailored to fetch and parse sitemaps from a given URL, extracting all listed links in the sitemap (even those that are nested).

Key Features:

  • Sitemap Parsing: Efficiently retrieves all URLs from a sitemap.

Sample Request:

curl --request POST \
  --url https://api.blat.ai/scrape_sitemap \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "url": "https://example.com/sitemap.xml"
}'

Response Structure:

{
  "links": [
    "https://example.com/page1",
    "https://example.com/page2"
  ]
}

For more information, consult the Scrape Sitemap Endpoint Documentation.

Practical Use Cases

  1. Content Aggregation: Utilize the /scrape endpoint to gather articles or blog posts from various sources, facilitating content curation.

  2. Market Research: Extract product details and pricing from competitor websites to inform strategic decisions.

  3. SEO Analysis: Leverage the /scrape_sitemap endpoint to retrieve all URLs from a competitor's sitemap, aiding in the analysis of their site structure and content strategy.

  4. Academic Research: Collect data from multiple online sources to support research projects, ensuring a broad and diverse dataset.

Blat's /scrape and /scrape_sitemap endpoints offer streamlined solutions for web scraping challenges, enabling efficient and effective data extraction at scale.

Read Next