We need a static URL for scraping results. Right now the URL changes after every run. What is the solution for this?
<project_id>/<spider_id>/<job_id>
How to replace <job_id> with last one as default ?
0 Votes
thriveni posted
almost 5 years ago
AdminBest Answer
You can use the Scrapinghub Jobs API and python-scrapinghub library. This library interacts with scrapy cloud, hence you can use in the spider to get the Job list and use the latest one.
Thanks,
Thriveni.
0 Votes
2 Comments
Sorted by
thriveniposted
almost 5 years ago
Admin
You can also fetch data from latest completed job in csv format using the url
FIELDNAME1 , FIELDNAME2 with the name of the fields, in the order you want them to appear in the CSV columns
APIKEY with your Apikey
0 Votes
thriveniposted
almost 5 years ago
AdminAnswer
You can use the Scrapinghub Jobs API and python-scrapinghub library. This library interacts with scrapy cloud, hence you can use in the spider to get the Job list and use the latest one.
We need a static URL for scraping results. Right now the URL changes after every run. What is the solution for this?
<project_id>/<spider_id>/<job_id>
How to replace <job_id> with last one as default ?
0 Votes
thriveni posted almost 5 years ago Admin Best Answer
You can use the Scrapinghub Jobs API and python-scrapinghub library. This library interacts with scrapy cloud, hence you can use in the spider to get the Job list and use the latest one.
Thanks,
Thriveni.
0 Votes
2 Comments
thriveni posted almost 5 years ago Admin
You can also fetch data from latest completed job in csv format using the url
https://app.scrapinghub.com/api/items.csv?project=PROJECTNUMBER&spider=SPIDERNAME&include_headers=1&fields=FIELDNAME1,FIELDNAME2&apikey=APIKEY '
You need to replace:
PROJECTNUMBER
with your project numberSPIDERNAME
with your spider nameFIELDNAME1
,FIELDNAME2
with the name of the fields, in the order you want them to appear in the CSV columnsAPIKEY
with your Apikey0 Votes
thriveni posted almost 5 years ago Admin Answer
You can use the Scrapinghub Jobs API and python-scrapinghub library. This library interacts with scrapy cloud, hence you can use in the spider to get the Job list and use the latest one.
Thanks,
Thriveni.
0 Votes
Login to post a comment