Start a new topic
Answered

403 Errors

I was using this Spider to scrape successfully initially, however it no longer scrapes information. (a) The log time is extended, (b) all the requests return a 403 error.


Job Stats:


downloader/request_bytes77632
downloader/request_count241
downloader/request_method_count/GET241
downloader/response_bytes375464
downloader/response_count241
downloader/response_status_count/403241
finish_reasonfinished
finish_time1563336236401
httperror/response_ignored_count240
httperror/response_ignored_status_count/403240
log_count/INFO272
memusage/max54398976
memusage/startup51970048
response_received_count241
scheduler/dequeued240
scheduler/dequeued/disk240
scheduler/enqueued240
scheduler/enqueued/disk240
start_time1563334774539

When I printed response locally, this is the result:


<!DOCTYPE html>

<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->

<!--[if IE 7]> <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->

<!--[if IE 8]> <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->

<!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->

<head>

<title>Access denied | www.konga.com used Cloudflare to restrict access</title>

<meta charset="UTF-8" />

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />

<meta name="robots" content="noindex, nofollow" />

<meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=

1" />

<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/cf.errors.css" t

ype="text/css" media="screen,projection" />

<!--[if lt IE 9]><link rel="stylesheet" id='cf_styles-ie-css' href="/cdn-cgi/sty

les/cf.errors.ie.css" type="text/css" media="screen,projection" /><![endif]-->

<style type="text/css">body{margin:0;padding:0}</style>



<!--[if gte IE 10]><!--><script type="text/javascript" src="/cdn-cgi/scripts/zep

to.min.js"></script><!--<![endif]-->

<!--[if gte IE 10]><!--><script type="text/javascript" src="/cdn-cgi/scripts/cf.

common.js"></script><!--<![endif]-->




</head>

<body>

 <div id="cf-wrapper">

 <div class="cf-alert cf-alert-error cf-cookie-error" id="cookie-alert" data-

translate="enable_cookies">Please enable cookies.</div>

 <div id="cf-error-details" class="cf-error-details-wrapper">

 <div class="cf-wrapper cf-header cf-error-overview">

 <h1>

 <span class="cf-error-type" data-translate="error">Error</span>

 <span class="cf-error-code">1020</span>

 <small class="heading-ray-id">Ray ID: 4f796addfbbdbba0 &bull; 2019-07-

17 04:19:24 UTC</small>

 </h1>

 <h2 class="cf-subheadline">Access denied</h2>

 </div><!-- /.header -->


 <section></section><!-- spacer -->


 <div class="cf-section cf-wrapper">

 <div class="cf-columns two">

 <div class="cf-column">

 <h2 data-translate="what_happened">What happened?</h2>

 <p>This website is using a security service to protect itself from o

nline attacks.</p>

 </div>


 

 </div>

 </div><!-- /.section -->


 <div class="cf-error-footer cf-wrapper">

 <p>

 <span class="cf-footer-item">Cloudflare Ray ID: <strong>4f796addfbbdbba0</st

rong></span>

 <span class="cf-footer-separator">&bull;</span>

 <span class="cf-footer-item"><span>Your IP</span>: 86.144.173.228</span>

 <span class="cf-footer-separator">&bull;</span>

 <span class="cf-footer-item"><span>Performance &amp; security by</span> <a h

ref="https://www.cloudflare.com/5xx-error-landing?utm_source=error_footer" id="b

rand_link" target="_blank">Cloudflare</a></span>

 

 </p>

</div><!-- /.error-footer -->



 </div><!-- /#cf-error-details -->

 </div><!-- /#cf-wrapper -->


 <script type="text/javascript">

 window._cf_translation = {};

 

 

</script>


</body>

</html>



Best Answer

Hello,


403 indicates that target website is banning the Scrapy Cloud IPs. You would need to use proxy services like Crawlera to crawl the website through Scrapy Cloud.

1 Comment

Answer

Hello,


403 indicates that target website is banning the Scrapy Cloud IPs. You would need to use proxy services like Crawlera to crawl the website through Scrapy Cloud.

Login to post a comment