Working with Datasets
Learn how to access and use Scrapezy datasets via the API
Working with Datasets
Scrapezy provides access to curated datasets through our marketplace. This guide will show you how to browse, purchase, and use datasets via our API.
Dataset Marketplace
Visit our Dataset Marketplace to browse available datasets. Each dataset includes:
- Detailed description and sample data
- Coverage information and update frequency
- Pricing options (one-time purchase or subscription)
- Data quality metrics and validation rules
- Last update date and total number of entries
Purchasing Datasets
One-Time Purchase
- Immediate access to the current dataset version
- Download complete dataset or access via API
- Includes updates for the purchased version
- Best for historical analysis or one-off projects
- Lower upfront cost
Annual Subscription
- Continuous access to latest data
- Regular updates (daily/weekly/monthly)
- Higher API rate limits
- Priority support
- Best for ongoing research and live applications
To purchase a dataset:
- Browse the Marketplace
- Select your desired dataset
- Choose purchase type (one-time or subscription)
- Complete checkout process
- Access your dataset via API key
Authentication
Before accessing any dataset, ensure you have a valid API key with appropriate permissions. See our Authentication Guide for details on:
- Obtaining API keys
- Setting permissions
- Security best practices
- Rate limiting
Using the API
Listing Available Datasets
View all datasets you have access to:
GET https://scrapezy.com/api/datasets
x-api-key: your_api_key
Response includes dataset details and access information:
[
{
"id": "dataset_123",
"name": "UK Court Records",
"description": "Comprehensive database of UK court records",
"category": "Legal",
"oneTimePurchasePrice": 79.99,
"subscriptionPrice": 599,
"updateFrequency": "weekly",
"lastUpdated": "2024-02-15T10:30:00Z",
"_count": {
"entries": 50000
},
"access": {
"type": "subscription",
"expiresAt": "2025-02-15T00:00:00Z"
}
}
]
Accessing Dataset Entries
Retrieve entries from a purchased dataset:
GET https://scrapezy.com/api/datasets/dataset_123/
x-api-key: your_api_key
Query Parameters
page
(default: 1): Page numberlimit
(default: 100): Entries per pagesort
: Sort field (e.g., "createdAt")order
: Sort order ("asc" or "desc")filter
: JSON object for filtering entries
Example with filtering:
GET https://scrapezy.com/api/datasets/dataset_123?page=1&limit=100&sort=createdAt&order=desc&filter={"year":2023}
x-api-key: your_api_key
Response includes entries and pagination metadata:
{
"data": [...],
"pagination": {
"currentPage": 1,
"totalPages": 500,
"totalItems": 50000,
"itemsPerPage": 100
}
}
Error Responses
Here are the common error responses you might encounter:
Invalid API Key
HTTP/1.1 401 Unauthorized
Content-Type: application/json
{
"error": {
"code": "INVALID_API_KEY",
"message": "The provided API key is invalid or has expired"
}
}
Access Denied
HTTP/1.1 403 Forbidden
Content-Type: application/json
{
"error": {
"code": "ACCESS_DENIED",
"message": "You don't have access to this dataset"
}
}
Dataset Not Found
HTTP/1.1 404 Not Found
Content-Type: application/json
{
"error": {
"code": "DATASET_NOT_FOUND",
"message": "The requested dataset was not found"
}
}
Best Practices
-
Efficient Data Retrieval
- Use pagination to handle large datasets
- Cache frequently accessed data
- Implement retry logic with exponential backoff
- Use appropriate filters to minimize data transfer
-
Error Handling
- Always check response status codes
- Implement proper error handling for rate limits
- Log and monitor API usage
- Handle subscription expiration gracefully
-
Security
- Keep your API key secure
- Use environment variables for API keys
- Regularly rotate API keys
- Monitor API key usage
-
Data Management
- Implement local caching for frequently accessed data
- Track dataset versions and updates
- Set up webhooks for update notifications
- Regular backup of critical data
Support
If you encounter any issues or have questions about datasets:
- Check our API Reference for detailed endpoint documentation
- Contact us at [email protected] for dataset-specific inquiries