Scraping a large URL list

Posted over 7 years ago by QLMarketing

Post a topic

Answered

QLMarketing

I have a large URL list (50k) in the form of a csv. Locally, I can open the csv with my spider like any other file and crawl the URLs. Is it possible to parse URLs from a csv on scrapinghub? When I deploy my project as is, scrapy cloud does not know where to find the csv. Any ideas would be welcome.

0 Votes

thriveni posted over 7 years ago Admin Best Answer

You need to declare the files in the package_data section of your setup.py file as given in Deploying non-code files.

Regards,

Thriveni Patil

0 Votes

2 Comments

rafalf posted about 7 years ago

Hi,

Can you share your code please

How do you access csv from within spiders? having similar issue

0 Votes

thriveni posted over 7 years ago Admin Answer

You need to declare the files in the package_data section of your setup.py file as given in Deploying non-code files.

Regards,

Thriveni Patil

0 Votes