Basic Usage
Learn how to use the API for basic data extraction tasks
This guide will walk you through the basic usage of our API for data extraction tasks.
Quick Example
Here's a simple example of how to extract data from a webpage:
POST https://scrapezy.com/api/extract
Content-Type: application/json
x-api-key: your_api_key
{
"url": "https://example.com/products",
"prompt": "Extract all product information from this page including names, prices, and descriptions"
}
Response:
{
"jobId": "job_123abc",
"status": "pending"
}
Check job status:
GET https://scrapezy.com/api/v1/extract/job_123abc
x-api-key: your_api_key
Response when complete:
{
"status": "completed",
"result": {
"products": [
{
"name": "Example Product 1",
"price": "$99.99",
"description": "This is an example product description"
},
{
"name": "Example Product 2",
"price": "$149.99",
"description": "Another example product description"
}
]
}
}
Features Overview
Key Features
-
Natural Language Prompts
- Describe what data you want to extract
- AI understands and extracts relevant information
- No need for complex selectors or XPath
-
Data Extraction
- Structured JSON output
- Automatic data cleaning
- Smart field detection
-
Configuration Options
- Rate limiting
- Retry logic
- Proxy support
Examples
Basic Extraction
Extract specific information from a webpage:
POST https://scrapezy.com/api/extract
Content-Type: application/json
x-api-key: your_api_key
{
"url": "https://example.com",
"prompt": "Extract the main article title and author name"
}
Response:
{
"jobId": "job_456def",
"status": "completed",
"result": {
"title": "Example Article Title",
"author": "John Doe"
}
}
Extracting Multiple Items
Extract a list of items from a webpage:
POST https://scrapezy.com/api/extract
Content-Type: application/json
x-api-key: your_api_key
{
"url": "https://example.com/blog",
"prompt": "Extract all blog posts including their titles, dates, and summaries"
}
Response:
{
"jobId": "job_789ghi",
"status": "completed",
"result": {
"posts": [
{
"title": "First Blog Post",
"date": "2024-02-14",
"summary": "This is the first blog post summary"
},
{
"title": "Second Blog Post",
"date": "2024-02-13",
"summary": "This is the second blog post summary"
}
]
}
}
Schema Validation
You can ensure consistent data structure by using schemas for validation:
POST https://scrapezy.com/api/extract
Content-Type: application/json
x-api-key: your_api_key
{
"url": "https://example.com/products",
"prompt": "Extract all product information",
"schema": {
"name": "Product Schema",
"fields": [
{
"name": "productName",
"type": "string",
"required": true,
"description": "Name of the product"
},
{
"name": "price",
"type": "number",
"required": true,
"description": "Price in USD"
},
{
"name": "inStock",
"type": "boolean",
"required": false,
"description": "Whether the product is in stock"
}
]
}
}
You can also reference a pre-existing schema by ID:
POST https://scrapezy.com/api/extract
Content-Type: application/json
x-api-key: your_api_key
{
"url": "https://example.com/products",
"prompt": "Extract all product information",
"schemaId": "schema_abc123"
}
Advanced Options
You can include additional options in your request:
POST https://scrapezy.com/api/extract
Content-Type: application/json
x-api-key: your_api_key
{
"url": "https://example.com/products",
"prompt": "Extract all product information",
"options": {
"bypassCache": true,
}
}
Error Responses
Here are common error responses you might encounter:
Invalid Request
HTTP/1.1 400 Bad Request
Content-Type: application/json
{
"error": {
"code": "INVALID_REQUEST",
"message": "URL is required"
}
}
Authentication Error
HTTP/1.1 401 Unauthorized
Content-Type: application/json
{
"error": {
"code": "INVALID_API_KEY",
"message": "Invalid or missing API key"
}
}
Best Practices
-
Writing Effective Prompts
- Be specific about what data you want
- Include field names in your prompt
- Specify the format you expect
-
Using Schemas for Data Validation
- Define schemas for consistent data structure
- Use
required
fields to ensure critical data is extracted - Leverage schema validation to catch extraction errors early
- Create reusable schemas for common data extraction patterns
-
Error Handling
- Always check job status
- Implement retry logic for rate limits
- Handle errors gracefully
-
Performance
- Use caching when possible
- Implement proper rate limiting
- Monitor API usage
Next Steps
- Schema Validation Guide - Learn how to use schemas for consistent data extraction
- Advanced Usage Guide - Explore complex extraction patterns
- API Reference - Complete API documentation