Finding files downloaded by spider in the UI

Posted over 6 years ago by rhiaro

Post a topic

Answered

rhiaro

Hi there,

When I run my spiders locally, they download JSON files from some API endpoints and save them to disc (using the Files pipeline component). When I run them in the scrapycloud, I can see each item with the URL of the file, and the file path set for it, but nowhere can I find the contents of the file. All the options I see are to download a dump of the metadata for each item.

Thanks!

0 Votes

nestor posted over 6 years ago Admin Best Answer

There's no write access to Scrapy Cloud. Instead, you'll need to use the alternative supported file storage provided by Files pipeline, S3 or GCS.

0 Votes

7 Comments

nestor posted over 6 years ago Admin

I see. Let me know if you have further questions or if there's anything else I can assist you with.

0 Votes

rhiaro posted over 6 years ago

I sent my reply before I saw the edit of your last comment to include that information.

0 Votes

nestor posted over 6 years ago Admin

Read my last comment, you have access to write to /scrapinghub and /tmp folders which is why you are able to use them. But once the job ends, the container (Scrapy Cloud unit) gets wiped so you need to export it somewhere else before the job ends, so use the built-in support for S3 or GCS.

0 Votes

rhiaro posted over 6 years ago

Once the file is downloaded, it's then read back from disc and posted to an API endpoint. The other end is receiving it. I could share a job ID but how would that give you access to anything useful?

0 Votes

nestor posted over 6 years ago Admin

Could you share a job ID?

Well, you do have access to /scrapinghub and /tmp folders, but it gets cleared after job run, so it would still make sense to export to an external storage anyways.

0 Votes

rhiaro posted over 6 years ago

Thanks for replying nestor. When I use the Files pipeline it seems to be downloading the files successfully. Or at least, the metadata implies it is doing, and the rest of my script can read them and use them.. wherever they are. So there must be somewhere to retrieve them from..

0 Votes

nestor posted over 6 years ago Admin Answer

There's no write access to Scrapy Cloud. Instead, you'll need to use the alternative supported file storage provided by Files pipeline, S3 or GCS.

0 Votes