MCP Server Integration

The Scrapezy MCP (Model Context Protocol) Server allows you to integrate Scrapezy's powerful web scraping and data extraction capabilities directly into AI tools like Claude Desktop and Cursor. This is a remote server that you connect to - no local installation required.

What is MCP?

The Model Context Protocol (MCP) is an open standard that enables AI applications to securely connect to external data sources and services. The Scrapezy MCP Server provides AI tools with direct access to:

Data Extraction: Extract structured data from any public website
Scraper Management: Trigger and monitor configured scrapers
Job Monitoring: Track extraction progress and results
Result Retrieval: Get extracted data in structured formats

Quick Setup

1. Authentication Setup

The Scrapezy MCP Server uses OAuth 2.0 for secure authentication. No manual setup required - when you first use the MCP tools, you'll be prompted to authenticate via your browser.

2. OAuth Configuration

For clients that support OAuth, you are able to add the following to the configuration.

{
  "mcpServers": {
    "scrapezy": {
      "url": "https://mcp.scrapezy.com"
    }
  }
}

4. Restart and Test

Restart Claude Desktop or Cursor
Start a new conversation
Try: "Extract the headlines from https://example-news.com"
When prompted, complete the OAuth authentication in your browser

Available Tools

The remote MCP server provides these tools to AI applications:

`start_extraction`

Extract structured data from any public website using natural language prompts.

Example prompt to AI: "Extract product names and prices from https://example-store.com/products"

`trigger_scraper`

Run a preconfigured scraper by its ID.

Example prompt to AI: "Run scraper ID 'scraper_abc123' to get the latest data"

`get_job_status`

Check the progress of any running extraction or scraping job.

Example prompt to AI: "Check the status of job 'job_xyz789'"

`get_job_results`

Retrieve results from completed extraction or scraping jobs.

Example prompt to AI: "Get the results from job 'job_xyz789'"

`get_recent_scraper_results`

Get the most recent results from a specific scraper without knowing job IDs.

Example prompt to AI: "Get the 5 most recent results from scraper 'scraper_abc123'"

Usage Examples

Basic Data Extraction

Simply describe what data you want to extract:

You: "Extract the top 10 news headlines from https://example-news.com along with their publication dates"

The AI will use the remote MCP server to:

Call start_extraction with the URL and your request
Monitor the job progress with get_job_status
Retrieve and format the results with get_job_results

E-commerce Price Monitoring

You: "Get current prices for all products on https://competitor.com/products and compare them to our pricing"

The AI will extract the pricing data and can help you analyze competitive positioning.

Content Aggregation

You: "Extract all blog post titles, authors, and summaries from https://company-blog.com and create a content calendar"

The AI will gather the content data and help organize it into a useful format.

Using Existing Scrapers

You: "Run my 'daily-news' scraper and summarize today's top stories"

If you have preconfigured scrapers, the AI can trigger them and process the results.

Advanced Options

Proxy Rotation

Avoid rate limits and IP blocks by using proxy rotation:

You: "Extract product data from https://example-store.com/products using proxy rotation"

The AI will include useProxy: true in the extraction request to route through Scrapezy's proxy network.

When to use proxies:

Scraping sites with aggressive rate limiting
Large-scale data extraction
Sites that block repeated requests from the same IP

Cache Control

Force fresh extraction by bypassing the cache:

You: "Extract the latest price from https://example.com/product, bypass cache"

The AI will include bypassCache: true to ensure you get the most current data.

Server Details

Remote Server URL

Production: https://mcp.scrapezy.com

Authentication

The MCP server uses OAuth 2.0 for secure authentication. When you first connect, you'll be redirected to Scrapezy's login page to authorize the connection. The OAuth flow will automatically handle token management and refresh.

Troubleshooting

Common Issues

"Authentication failed"

Complete the OAuth authorization flow in your browser
Ensure you have the required permissions for the requested scopes
Check that OAuth tokens haven't expired

"Connection timeout"

Check your internet connection
Verify the server URL is correct (https://mcp.scrapezy.com)
Ensure your firewall allows HTTPS connections

"Tool execution failed"

Check that you have sufficient credits in your Scrapezy account
Verify the target website is publicly accessible
Review error messages for specific issues

Getting Help

If you encounter issues:

Check the troubleshooting guide
Contact support through the help center

Best Practices

Security

OAuth tokens are automatically managed and refreshed
Tokens are short-lived for enhanced security
Review and revoke OAuth applications as needed in your dashboard
Monitor OAuth usage and active connections

Performance

Use specific, clear prompts for better extraction accuracy
Leverage existing scrapers for recurring data extraction needs
Monitor your credit usage and set up alerts

Integration

Start with simple extractions to test your setup
Gradually build more complex workflows
Document your common extraction patterns for reuse

Getting Started

Guides

Troubleshooting

Integrations

Reference

MCP Server Integration

What is MCP?

Quick Setup

1. Authentication Setup

2. OAuth Configuration

4. Restart and Test

Available Tools

`start_extraction`

`trigger_scraper`

`get_job_status`

`get_job_results`

`get_recent_scraper_results`

Usage Examples

Basic Data Extraction

E-commerce Price Monitoring

Content Aggregation

Using Existing Scrapers

Advanced Options

Proxy Rotation

Cache Control

Server Details

Remote Server URL

Authentication

Troubleshooting

Common Issues

Getting Help

Best Practices

Security

Performance

Integration

Next Steps