Working with Datasets

Learn how to access and use Scrapezy datasets via the API


Working with Datasets

Scrapezy provides access to curated datasets through our marketplace. This guide will show you how to browse, purchase, and use datasets via our API.

Dataset Marketplace

Visit our Dataset Marketplace to browse available datasets. Each dataset includes:

  • Detailed description and sample data
  • Coverage information and update frequency
  • Pricing options (one-time purchase or subscription)
  • Data quality metrics and validation rules
  • Last update date and total number of entries

Purchasing Datasets

One-Time Purchase

  • Immediate access to the current dataset version
  • Download complete dataset or access via API
  • Includes updates for the purchased version
  • Best for historical analysis or one-off projects
  • Lower upfront cost

Annual Subscription

  • Continuous access to latest data
  • Regular updates (daily/weekly/monthly)
  • Higher API rate limits
  • Priority support
  • Best for ongoing research and live applications

To purchase a dataset:

  1. Browse the Marketplace
  2. Select your desired dataset
  3. Choose purchase type (one-time or subscription)
  4. Complete checkout process
  5. Access your dataset via API key

Authentication

Before accessing any dataset, ensure you have a valid API key with appropriate permissions. See our Authentication Guide for details on:

  • Obtaining API keys
  • Setting permissions
  • Security best practices
  • Rate limiting

Using the API

Listing Available Datasets

View all datasets you have access to:

GET https://scrapezy.com/api/datasets
x-api-key: your_api_key

Response includes dataset details and access information:

[
  {
    "id": "dataset_123",
    "name": "UK Court Records",
    "description": "Comprehensive database of UK court records",
    "category": "Legal",
    "oneTimePurchasePrice": 79.99,
    "subscriptionPrice": 599,
    "updateFrequency": "weekly",
    "lastUpdated": "2024-02-15T10:30:00Z",
    "_count": {
      "entries": 50000
    },
    "access": {
      "type": "subscription",
      "expiresAt": "2025-02-15T00:00:00Z"
    }
  }
]

Accessing Dataset Entries

Retrieve entries from a purchased dataset:

GET https://scrapezy.com/api/datasets/dataset_123/
x-api-key: your_api_key

Query Parameters

  • page (default: 1): Page number
  • limit (default: 100): Entries per page
  • sort: Sort field (e.g., "createdAt")
  • order: Sort order ("asc" or "desc")
  • filter: JSON object for filtering entries

Example with filtering:

GET https://scrapezy.com/api/datasets/dataset_123?page=1&limit=100&sort=createdAt&order=desc&filter={"year":2023}
x-api-key: your_api_key

Response includes entries and pagination metadata:

{
  "data": [...],
  "pagination": {
    "currentPage": 1,
    "totalPages": 500,
    "totalItems": 50000,
    "itemsPerPage": 100
  }
}

Error Responses

Here are the common error responses you might encounter:

Invalid API Key

HTTP/1.1 401 Unauthorized
Content-Type: application/json
 
{
  "error": {
    "code": "INVALID_API_KEY",
    "message": "The provided API key is invalid or has expired"
  }
}

Access Denied

HTTP/1.1 403 Forbidden
Content-Type: application/json
 
{
  "error": {
    "code": "ACCESS_DENIED",
    "message": "You don't have access to this dataset"
  }
}

Dataset Not Found

HTTP/1.1 404 Not Found
Content-Type: application/json
 
{
  "error": {
    "code": "DATASET_NOT_FOUND",
    "message": "The requested dataset was not found"
  }
}

Best Practices

  1. Efficient Data Retrieval

    • Use pagination to handle large datasets
    • Cache frequently accessed data
    • Implement retry logic with exponential backoff
    • Use appropriate filters to minimize data transfer
  2. Error Handling

    • Always check response status codes
    • Implement proper error handling for rate limits
    • Log and monitor API usage
    • Handle subscription expiration gracefully
  3. Security

    • Keep your API key secure
    • Use environment variables for API keys
    • Regularly rotate API keys
    • Monitor API key usage
  4. Data Management

    • Implement local caching for frequently accessed data
    • Track dataset versions and updates
    • Set up webhooks for update notifications
    • Regular backup of critical data

Support

If you encounter any issues or have questions about datasets:

  1. Check our API Reference for detailed endpoint documentation
  2. Contact us at [email protected] for dataset-specific inquiries

Next Steps