We need a static URL for scraping results.

Posted over 5 years ago by İsmail Aras

Post a topic

Answered

İsmail Aras

We need a static URL for scraping results. Right now the URL changes after every run. What is the solution for this?

<project_id>/<spider_id>/<job_id>

How to replace <job_id> with last one as default ?

0 Votes

thriveni posted over 5 years ago Admin Best Answer

You can use the Scrapinghub Jobs API and python-scrapinghub library. This library interacts with scrapy cloud, hence you can use in the spider to get the Job list and use the latest one.

Thanks,

Thriveni.

0 Votes

2 Comments

thriveni posted over 5 years ago Admin

You can also fetch data from latest completed job in csv format using the url

https://app.scrapinghub.com/api/items.csv?project=PROJECTNUMBER&spider=SPIDERNAME&include_headers=1&fields=FIELDNAME1,FIELDNAME2&apikey=APIKEY '

You need to replace:

PROJECTNUMBER with your project number
SPIDERNAME with your spider name
FIELDNAME1 , FIELDNAME2 with the name of the fields, in the order you want them to appear in the CSV columns
APIKEY with your Apikey

0 Votes

thriveni posted over 5 years ago Admin Answer

You can use the Scrapinghub Jobs API and python-scrapinghub library. This library interacts with scrapy cloud, hence you can use in the spider to get the Job list and use the latest one.

Thanks,

Thriveni.

0 Votes