We need a static URL for scraping results. Right now the URL changes after every run. What is the solution for this?
<project_id>/<spider_id>/<job_id>
How to replace <job_id> with last one as default ?
0 Votes
thriveni posted
almost 6 years ago
AdminBest Answer
You can use the Scrapinghub Jobs API and python-scrapinghub library. This library interacts with scrapy cloud, hence you can use in the spider to get the Job list and use the latest one.
Thanks,
Thriveni.
0 Votes
2 Comments
Sorted by
thriveniposted
almost 6 years ago
Admin
You can also fetch data from latest completed job in csv format using the url
FIELDNAME1 , FIELDNAME2 with the name of the fields, in the order you want them to appear in the CSV columns
APIKEY with your Apikey
0 Votes
thriveniposted
almost 6 years ago
AdminAnswer
You can use the Scrapinghub Jobs API and python-scrapinghub library. This library interacts with scrapy cloud, hence you can use in the spider to get the Job list and use the latest one.
We need a static URL for scraping results. Right now the URL changes after every run. What is the solution for this?
<project_id>/<spider_id>/<job_id>
How to replace <job_id> with last one as default ?
0 Votes
thriveni posted almost 6 years ago Admin Best Answer
You can use the Scrapinghub Jobs API and python-scrapinghub library. This library interacts with scrapy cloud, hence you can use in the spider to get the Job list and use the latest one.
Thanks,
Thriveni.
0 Votes
2 Comments
thriveni posted almost 6 years ago Admin
You can also fetch data from latest completed job in csv format using the url
https://app.scrapinghub.com/api/items.csv?project=PROJECTNUMBER&spider=SPIDERNAME&include_headers=1&fields=FIELDNAME1,FIELDNAME2&apikey=APIKEY '
You need to replace:
PROJECTNUMBERwith your project numberSPIDERNAMEwith your spider nameFIELDNAME1,FIELDNAME2with the name of the fields, in the order you want them to appear in the CSV columnsAPIKEYwith your Apikey0 Votes
thriveni posted almost 6 years ago Admin Answer
You can use the Scrapinghub Jobs API and python-scrapinghub library. This library interacts with scrapy cloud, hence you can use in the spider to get the Job list and use the latest one.
Thanks,
Thriveni.
0 Votes
Login to post a comment