Scraping a large URL list

Posted over 6 years ago by QLMarketing

Post a topic
Answered
Q
QLMarketing

 I have a large URL list (50k) in the form of a csv. Locally, I can open the csv with my spider like any other file and crawl the URLs. Is it possible to parse URLs from a csv on scrapinghub? When I deploy my project as is, scrapy cloud does not know where to find the csv. Any ideas would be welcome.

0 Votes

thriveni

thriveni posted over 6 years ago Admin Best Answer

You need to declare the files in the package_data  section of your setup.py  file as given in Deploying non-code files.


Regards,

Thriveni Patil

0 Votes


2 Comments

Sorted by
r

rafalf posted over 6 years ago

Hi, 

Can you share your code please 

How do you access csv from within spiders? having similar issue 

0 Votes

thriveni

thriveni posted over 6 years ago Admin Answer

You need to declare the files in the package_data  section of your setup.py  file as given in Deploying non-code files.


Regards,

Thriveni Patil

0 Votes

Login to post a comment