Start a new topic
Answered

Scraping a large URL list

 I have a large URL list (50k) in the form of a csv. Locally, I can open the csv with my spider like any other file and crawl the URLs. Is it possible to parse URLs from a csv on scrapinghub? When I deploy my project as is, scrapy cloud does not know where to find the csv. Any ideas would be welcome.


Best Answer

You need to declare the files in the package_data  section of your setup.py  file as given in Deploying non-code files.


Regards,

Thriveni Patil


Hi, 

Can you share your code please 

How do you access csv from within spiders? having similar issue 

Answer

You need to declare the files in the package_data  section of your setup.py  file as given in Deploying non-code files.


Regards,

Thriveni Patil

Login to post a comment