I was using this Spider to scrape successfully initially, however it no longer scrapes information. (a) The log time is extended, (b) all the requests return a 403 error.
Job Stats:
downloader/request_bytes
77632
downloader/request_count
241
downloader/request_method_count/GET
241
downloader/response_bytes
375464
downloader/response_count
241
downloader/response_status_count/403
241
finish_reason
finished
finish_time
1563336236401
httperror/response_ignored_count
240
httperror/response_ignored_status_count/403
240
log_count/INFO
272
memusage/max
54398976
memusage/startup
51970048
response_received_count
241
scheduler/dequeued
240
scheduler/dequeued/disk
240
scheduler/enqueued
240
scheduler/enqueued/disk
240
start_time
1563334774539
When I printed response locally, this is the result:
403 indicates that target website is banning the Scrapy Cloud IPs. You would need to use proxy services like Crawlera to crawl the website through Scrapy Cloud.
0 Votes
1 Comments
thriveniposted
over 5 years ago
AdminAnswer
Hello,
403 indicates that target website is banning the Scrapy Cloud IPs. You would need to use proxy services like Crawlera to crawl the website through Scrapy Cloud.
I was using this Spider to scrape successfully initially, however it no longer scrapes information. (a) The log time is extended, (b) all the requests return a 403 error.
Job Stats:
When I printed response locally, this is the result:
<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
<!--[if IE 7]> <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
<!--[if IE 8]> <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->
<head>
<title>Access denied | www.konga.com used Cloudflare to restrict access</title>
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />
<meta name="robots" content="noindex, nofollow" />
<meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=
1" />
<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/cf.errors.css" t
ype="text/css" media="screen,projection" />
<!--[if lt IE 9]><link rel="stylesheet" id='cf_styles-ie-css' href="/cdn-cgi/sty
les/cf.errors.ie.css" type="text/css" media="screen,projection" /><![endif]-->
<style type="text/css">body{margin:0;padding:0}</style>
<!--[if gte IE 10]><!--><script type="text/javascript" src="/cdn-cgi/scripts/zep
to.min.js"></script><!--<![endif]-->
<!--[if gte IE 10]><!--><script type="text/javascript" src="/cdn-cgi/scripts/cf.
common.js"></script><!--<![endif]-->
</head>
<body>
<div id="cf-wrapper">
<div class="cf-alert cf-alert-error cf-cookie-error" id="cookie-alert" data-
translate="enable_cookies">Please enable cookies.</div>
<div id="cf-error-details" class="cf-error-details-wrapper">
<div class="cf-wrapper cf-header cf-error-overview">
<h1>
<span class="cf-error-type" data-translate="error">Error</span>
<span class="cf-error-code">1020</span>
<small class="heading-ray-id">Ray ID: 4f796addfbbdbba0 • 2019-07-
17 04:19:24 UTC</small>
</h1>
<h2 class="cf-subheadline">Access denied</h2>
</div><!-- /.header -->
<section></section><!-- spacer -->
<div class="cf-section cf-wrapper">
<div class="cf-columns two">
<div class="cf-column">
<h2 data-translate="what_happened">What happened?</h2>
<p>This website is using a security service to protect itself from o
nline attacks.</p>
</div>
</div>
</div><!-- /.section -->
<div class="cf-error-footer cf-wrapper">
<p>
<span class="cf-footer-item">Cloudflare Ray ID: <strong>4f796addfbbdbba0</st
rong></span>
<span class="cf-footer-separator">•</span>
<span class="cf-footer-item"><span>Your IP</span>: 86.144.173.228</span>
<span class="cf-footer-separator">•</span>
<span class="cf-footer-item"><span>Performance & security by</span> <a h
ref="https://www.cloudflare.com/5xx-error-landing?utm_source=error_footer" id="b
rand_link" target="_blank">Cloudflare</a></span>
</p>
</div><!-- /.error-footer -->
</div><!-- /#cf-error-details -->
</div><!-- /#cf-wrapper -->
<script type="text/javascript">
window._cf_translation = {};
</script>
</body>
</html>
0 Votes
thriveni posted over 5 years ago Admin Best Answer
Hello,
403 indicates that target website is banning the Scrapy Cloud IPs. You would need to use proxy services like Crawlera to crawl the website through Scrapy Cloud.
0 Votes
1 Comments
thriveni posted over 5 years ago Admin Answer
Hello,
403 indicates that target website is banning the Scrapy Cloud IPs. You would need to use proxy services like Crawlera to crawl the website through Scrapy Cloud.
0 Votes
Login to post a comment