I'm building a spider using Portia and would like to display items on my website. I've a few ideas in mind on how to do that and would appreciate any help in deciding witch is the best way to go:
Method 1:
Set up a cron job that fetches data from Scrapy Cloud API
store the data in a local db
fetch data from local db to be displayed for users.
Method 2: Display data for users directly using Scrapy Cloud API.I'm not sure this is feasible for a number of reasons:
Scrapy Cloud API requests might be different for each job.
Scrapy Cloud API may not support features as us filtering and searching.
Method 3: Setup a dataset and request data from this dataset. Is there any way to access datasets via an API?
Thanks in advance.
Best Answer
t
tom
said
almost 7 years ago
Another good option is on your site to have a rest api set in place that allows your spider to send the data direct to that rest api that then can be processed to show on your web site.
Another good option is to dump your data from the spider to a s3 bucket that then could be picked up by a cron job on your site.
I wouldn't recommend gave that decision to our side.
I suggest to try both methods and then choose that better fits for your project. If you want you can share with us which one you choose and why.
That could be extremely useful for other users with similar inquiries.
Thanks for making this Community better.
Best regards,
Pablo
tom
said
almost 7 years ago
Answer
Another good option is on your site to have a rest api set in place that allows your spider to send the data direct to that rest api that then can be processed to show on your web site.
Another good option is to dump your data from the spider to a s3 bucket that then could be picked up by a cron job on your site.
sano
Hello,
I'm building a spider using Portia and would like to display items on my website. I've a few ideas in mind on how to do that and would appreciate any help in deciding witch is the best way to go:
Another good option is on your site to have a rest api set in place that allows your spider to send the data direct to that rest api that then can be processed to show on your web site.
Another good option is to dump your data from the spider to a s3 bucket that then could be picked up by a cron job on your site.
- Oldest First
- Popular
- Newest First
Sorted by Oldest Firstvaz
Hi Sano,
I wouldn't recommend gave that decision to our side.
I suggest to try both methods and then choose that better fits for your project. If you want you can share with us which one you choose and why.
That could be extremely useful for other users with similar inquiries.
Thanks for making this Community better.
Best regards,
Pablo
tom
Another good option is on your site to have a rest api set in place that allows your spider to send the data direct to that rest api that then can be processed to show on your web site.
Another good option is to dump your data from the spider to a s3 bucket that then could be picked up by a cron job on your site.
-
Unable to select Scrapy project in GitHub
-
ScrapyCloud can't call spider?
-
Unhandled error in Deferred
-
Item API - Filtering
-
newbie to web scraping but need data from zillow
-
ValueError: Invalid control character
-
Cancelling account
-
Beautifulsoup with ScrapingHub
-
Delete a project in ScrapingHub
See all 458 topics