videocamWeb Data Extraction Summit - September 30th, 2021.
Join some of the greatest minds in web scraping to educate, inspire, and innovate.
Register for free!
Start a new topic
Answered

Best Practices

Hello,


I'm building a spider using Portia and would like to display items on my website. I've a few ideas in mind on how to do that and would appreciate any help in deciding witch is the best way to go:

  • Method 1:
  1. Set up a cron job that fetches data from Scrapy Cloud API
  2. store the data in a local db
  3. fetch data from local db to be displayed for users.
  • Method 2: Display data for users directly using Scrapy Cloud API.I'm not sure this is feasible for a number of reasons:
  1. Scrapy Cloud API requests might be different for each job.
  2. Scrapy Cloud API may not support features as us filtering and searching.
  • Method 3: Setup a dataset and request data from this dataset. Is there any way to access datasets via an API?
Thanks in advance.


Best Answer

Another good option is on your site to have a rest api set in place that allows your spider to send the data direct to that rest api that then can be processed to show on your web site.


Another good option is to dump your data from the spider to a s3 bucket that then could be picked up by a cron job on your site.


Hi Sano,


I wouldn't recommend gave that decision to our side.


I suggest to try both methods and then choose that better fits for your project. If you want you can share with us which one you choose and why.


That could be extremely useful for other users with similar inquiries.


Thanks for making this Community better.


Best regards,


Pablo

Answer

Another good option is on your site to have a rest api set in place that allows your spider to send the data direct to that rest api that then can be processed to show on your web site.


Another good option is to dump your data from the spider to a s3 bucket that then could be picked up by a cron job on your site.

Login to post a comment