Start a new topic

From local scrapy to scrapy cloud - Unexpected results

Hi Eveybody,


The scraper I deployed on Scrapy cloud is producing an unexpected result  compared to the local version. 

My local version can easily extract every field of a product item (from an online retailer) but on the scrapy cloud, the field "ingredients" and the field "list of prices" are always displayed as empty. 

You'll see in a picture attached the two elements I'm always having empty as a result whereas it's perfectly working 

I'mu using Python 3 and the stack was configured with  a scrapy:1.3-py3 configuration. 

I thought first it was in a issue with the regex and unicode but seems not.

So i tried everything : ur, ur RE.ENCODE .... and didn't work. 



For the ingredients part, my code is the following : 

 

        data_box=response.xpath('//*[@id="ingredients"]').css('div.information__tab__content *::text').extract()
        data_inter=''.join(data_box).strip()

        match1=re.search(r'([Ii]ngr[ée]dients\s*\:{0,1})\s*(.*)\.*',data_inter)
        match2=re.search(r'([Cc]omposition\s*\:{0,1})\s*(.*)\.*',data_inter)


        if match1:
            result_matching_ingredients=match1.group(1,2)[1].replace('"','').replace(".","").replace(";",",").strip()

        elif match2 : 
            result_matching_ingredients=match2.group(1,2)[1].replace('"','').replace(".","").replace(";",",").strip()

        else:
            result_matching_ingredients=''

        ingredients=result_matching_ingredients

 It seems that the matching never occurs on scrapy cloud. 



For prices, my code is the following : 


 

        list_prices=[]

        for package in list_packaging : 
            tonnage=package.css('div.product__varianttitle::text').extract_first().strip()
            prix_inter=(''.join(package.css('span.product__smallprice__text').re(r'\(\s*\d+\,\d*\s*€\s*\/\s*kg\)')))
            prix=prix_inter.replace("(","").replace(")","").replace("/","").replace("€","").replace("kg","").replace(",",".").strip()

            list_prices.append(prix)

 

That's the same story. Still empty. 




I repeat : it's working fine on my local version. 

Those two data are the only one causing issue : i'm extracting a bunch of other data (with Regex too) with scrapy cloud and I'm very satisfied with it ? 



Any ideas guys ? 



1 Comment

Hi, 


The fields ingredient and list_price seems to have been extracted in recent job. It would be great if you can share what changes were done to get the fields as this would help other users in case they face similar issue.

Login to post a comment