Apparently, there is a 1 MB limitation on serialized items. Is there a way to remove the limitation? I need around 6 MB at least.
Best Answer
n
nestor
said
over 6 years ago
There's no way to remove the limitation. Depending on your use case: one solution would be to split your items into several, for example accumulated data from a paginated list. Another solution would be to enable Page Storage addon, and access raw HTML pages from Collections (If you are storing raw HTML as an Item). Another solution would be to store your items in Amazon S3 using FeedExport.
There's no way to remove the limitation. Depending on your use case: one solution would be to split your items into several, for example accumulated data from a paginated list. Another solution would be to enable Page Storage addon, and access raw HTML pages from Collections (If you are storing raw HTML as an Item). Another solution would be to store your items in Amazon S3 using FeedExport.
1 person likes this
M
Mattia Ferrini
said
over 5 years ago
From time to time, my scraper is not able to parse the html.
I am trying to get access to the raw HTML. I have enabled the Page Storage addon and I raise an error. I get a warning that says "Page not saved, body too large: ".
Any workaround?
S
SAI KATTA
said
over 2 years ago
Hi ,
In my case the extracted data will be assigned to few variables and returned as json
So is the one mb limitation is too entire json
Or each variable in json. Could you please confirm this.
Response ={
Paganame :
Html content :
Downloaded pdf:
}
All these will be return as one item under zyte items tab.
Is the one mb limitation for entire response or pagename,html content and pdf content individually
}
RestStep
Apparently, there is a 1 MB limitation on serialized items. Is there a way to remove the limitation? I need around 6 MB at least.
There's no way to remove the limitation. Depending on your use case: one solution would be to split your items into several, for example accumulated data from a paginated list. Another solution would be to enable Page Storage addon, and access raw HTML pages from Collections (If you are storing raw HTML as an Item). Another solution would be to store your items in Amazon S3 using FeedExport.
- Oldest First
- Popular
- Newest First
Sorted by Oldest Firstnestor
There's no way to remove the limitation. Depending on your use case: one solution would be to split your items into several, for example accumulated data from a paginated list. Another solution would be to enable Page Storage addon, and access raw HTML pages from Collections (If you are storing raw HTML as an Item). Another solution would be to store your items in Amazon S3 using FeedExport.
1 person likes this
Mattia Ferrini
From time to time, my scraper is not able to parse the html.
I am trying to get access to the raw HTML. I have enabled the Page Storage addon and I raise an error. I get a warning that says "Page not saved, body too large: ".
Any workaround?
SAI KATTA
-
Unable to select Scrapy project in GitHub
-
ScrapyCloud can't call spider?
-
Unhandled error in Deferred
-
Item API - Filtering
-
newbie to web scraping but need data from zillow
-
ValueError: Invalid control character
-
Cancelling account
-
Best Practices
-
Beautifulsoup with ScrapingHub
-
Delete a project in ScrapingHub
See all 458 topics