Learn all about the latest trends and best practices in data extraction - Join us at Extract SummitGet tickets
Start a new topic
Answered

Rejected message because it was too big: ITM {

Apparently, there is a 1 MB limitation on serialized items. Is there a way to remove the limitation? I need around 6 MB at least.


Best Answer

There's no way to remove the limitation. Depending on your use case: one solution would be to split your items into several, for example accumulated data from a paginated list. Another solution would be to enable Page Storage addon, and access raw HTML pages from Collections (If you are storing raw HTML as an Item). Another solution would be to store your items in Amazon S3 using FeedExport.


Answer

There's no way to remove the limitation. Depending on your use case: one solution would be to split your items into several, for example accumulated data from a paginated list. Another solution would be to enable Page Storage addon, and access raw HTML pages from Collections (If you are storing raw HTML as an Item). Another solution would be to store your items in Amazon S3 using FeedExport.


1 person likes this

From time to time, my scraper is not able to parse the html.

I am trying to get access to the raw HTML. I have enabled the Page Storage addon and I raise an error. I get a warning that says "Page not saved, body too large: ". 

Any workaround?


Hi , In my case the extracted data will be assigned to few variables and returned as json So is the one mb limitation is too entire json Or each variable in json. Could you please confirm this. Response ={ Paganame : Html content : Downloaded pdf: } All these will be return as one item under zyte items tab. Is the one mb limitation for entire response or pagename,html content and pdf content individually }
Login to post a comment