MCP Server Integration
Connect Scrapezy to Claude Desktop, Cursor, and other MCP-compatible AI tools for seamless data extraction
The Scrapezy MCP (Model Context Protocol) Server allows you to integrate Scrapezy's powerful web scraping and data extraction capabilities directly into AI tools like Claude Desktop and Cursor. This is a remote server that you connect to - no local installation required.
What is MCP?
The Model Context Protocol (MCP) is an open standard that enables AI applications to securely connect to external data sources and services. The Scrapezy MCP Server provides AI tools with direct access to:
- Data Extraction: Extract structured data from any public website
- Scraper Management: Trigger and monitor configured scrapers
- Job Monitoring: Track extraction progress and results
- Result Retrieval: Get extracted data in structured formats
Quick Setup
1. Authentication Setup
The Scrapezy MCP Server uses OAuth 2.0 for secure authentication. No manual setup required - when you first use the MCP tools, you'll be prompted to authenticate via your browser.
2. OAuth Configuration
For clients that support OAuth, you are able to add the following to the configuration.
{
"mcpServers": {
"scrapezy": {
"url": "http://mcp.scrapezy.com"
}
}
}
4. Restart and Test
- Restart Claude Desktop or Cursor
- Start a new conversation
- Try: "Extract the headlines from https://news.bbc.co.uk"
- When prompted, complete the OAuth authentication in your browser
Available Tools
The remote MCP server provides these tools to AI applications:
start_extraction
Extract structured data from any public website using natural language prompts.
Example prompt to AI: "Extract product names and prices from https://example-store.com/products"
trigger_scraper
Run a preconfigured scraper by its ID.
Example prompt to AI: "Run scraper ID 'scraper_abc123' to get the latest data"
get_job_status
Check the progress of any running extraction or scraping job.
Example prompt to AI: "Check the status of job 'job_xyz789'"
get_job_results
Retrieve results from completed extraction or scraping jobs.
Example prompt to AI: "Get the results from job 'job_xyz789'"
get_recent_scraper_results
Get the most recent results from a specific scraper without knowing job IDs.
Example prompt to AI: "Get the 5 most recent results from scraper 'scraper_abc123'"
Usage Examples
Basic Data Extraction
Simply describe what data you want to extract:
You: "Extract the top 10 news headlines from https://techcrunch.com along with their publication dates"
The AI will use the remote MCP server to:
- Call
start_extraction
with the URL and your request - Monitor the job progress with
get_job_status
- Retrieve and format the results with
get_job_results
E-commerce Price Monitoring
You: "Get current prices for all products on https://competitor.com/products and compare them to our pricing"
The AI will extract the pricing data and can help you analyze competitive positioning.
Content Aggregation
You: "Extract all blog post titles, authors, and summaries from https://company-blog.com and create a content calendar"
The AI will gather the content data and help organize it into a useful format.
Using Existing Scrapers
You: "Run my 'daily-news' scraper and summarize today's top stories"
If you have preconfigured scrapers, the AI can trigger them and process the results.
Server Details
Remote Server URL
- Production:
https://mcp.scrapezy.com
Authentication
The MCP server uses OAuth 2.0 for secure authentication. When you first connect, you'll be redirected to Scrapezy's login page to authorize the connection. The OAuth flow will automatically handle token management and refresh.
Troubleshooting
Common Issues
"Authentication failed"
- Complete the OAuth authorization flow in your browser
- Ensure you have the required permissions for the requested scopes
- Check that OAuth tokens haven't expired
"Connection timeout"
- Check your internet connection
- Verify the server URL is correct (
https://mcp.scrapezy.com
) - Ensure your firewall allows HTTPS connections
"Tool execution failed"
- Check that you have sufficient credits in your Scrapezy account
- Verify the target website is publicly accessible
- Review error messages for specific issues
Getting Help
If you encounter issues:
- Check the troubleshooting guide
- Contact support through the help center
Best Practices
Security
- OAuth tokens are automatically managed and refreshed
- Tokens are short-lived for enhanced security
- Review and revoke OAuth applications as needed in your dashboard
- Monitor OAuth usage and active connections
Performance
- Use specific, clear prompts for better extraction accuracy
- Leverage existing scrapers for recurring data extraction needs
- Monitor your credit usage and set up alerts
Integration
- Start with simple extractions to test your setup
- Gradually build more complex workflows
- Document your common extraction patterns for reuse