sh_scrapy.extension - Wrong item type: None

Posted about 4 years ago by Davide

Post a topic

Un Answered

Davide

I'm trying to start my CrawlerSpider on Zyte but I have a very annoying error

[sh_scrapy.extension] Wrong item type: None

I have followed the documentation to create a crawler through which extract all links in a specific web page but when I start the job on zyte the Scraper send correctly the request but immediately return me the error.

The code to create this CrawlerSpider is very simple and minimal, this is the part responsabile to create the Rules and the LinkExtractor instance:

    self.restrict_css = [self.selector_item]
    if self.selector_next_page:
        self.restrict_css.append(self.selector_next_page)

    self.rules = (
        Rule(
            LinkExtractor(
                deny_extensions=["css", "js"],
                unique=True,
                restrict_css=self.restrict_css,
                process_value=lambda value: check_noindex_nofollow(value),
            ),
            process_links="ignore_nofollow_noindex",
            callback="parse",
            follow=True,
        ),
    )

Basically

    self.restrict_css = [self.selector_item]
    if self.selector_next_page:
        self.restrict_css.append(self.selector_next_page)

Create an array with one element if the site doesn't have a next_page or two element if the site has a next_page. This array is useful to limit the crawling only in a specific part of the site, indeed:

restrict_css=self.restrict_css,

Do this.

The parse is:

def parse(
    self,
    response,
):
    item = ItemLoader(item=PageLink(), response=response)
    item.add_css("name", "title::text")
    item.add_value("url", response.url)
    item.add_css("image", "img::attr(src)")
    item.add_value("depth", response.meta["depth"])
    item.add_value("timestamp", self.timestamp)
    yield item.load_item()

PageLink is a scrapy.Item declared in a specific file and imported into the CrawlerSpider, so the class know about it.

If I start the scraper with a specific link and a specific css rules, immediately return me [sh_scrapy.extension] Wrong item type: None after sending the requests, I don't know why. The only thing I founded is the line code that fire this error

Does Anyone have experienced this before? How can I resolve this very annoying problem?

Thank you very much

1 Votes

1 Comments

Joachim Hertel posted over 3 years ago

I have the same error message, but in another context. Is it possible, that the error occurs when having to many pipelines in a spider?

0 Votes