My scraper gets all of the links the sub pages and scrapes those correctly (25 results), but isn't correctly submitting the form request to get the next 25 results to scrape (and so on). I would appreciate any help anyone can offer. Thanks!
The next_group method needs to be called recursively to follow the "Next" Button. Currently its not being invoked in the spider code hence only first 25 records are retrieved. You can refer https://doc.scrapy.org/en/latest/intro/tutorial.html#following-links to know how to follow links in Scrapy.
0 Votes
2 Comments
Sorted by
J
Jacob Makowskiposted
over 6 years ago
Thanks, thriveni
I added the next_group method but am not getting more than 25 records. The next page link looks this way:
Which is why I was asking about using FormRequest. Thanks!
0 Votes
thriveniposted
over 6 years ago
AdminAnswer
The next_group method needs to be called recursively to follow the "Next" Button. Currently its not being invoked in the spider code hence only first 25 records are retrieved. You can refer https://doc.scrapy.org/en/latest/intro/tutorial.html#following-links to know how to follow links in Scrapy.
I'm having trouble scraping this page: http://maps.kalkaskacounty.net/propertysearch.asp?PDBsearch=setdo
My scraper gets all of the links the sub pages and scrapes those correctly (25 results), but isn't correctly submitting the form request to get the next 25 results to scrape (and so on). I would appreciate any help anyone can offer. Thanks!
class ParcelScraperSpider(scrapy.Spider):
name = 'parcel_scraper'
start_urls = ['http://maps.kalkaskacounty.net/propertysearch.asp?PDBsearch=setdo',
'http://maps.kalkaskacounty.net/,']
def parse(self,response):
for href in response.css('a.PDBlistlink::attr(href)'):
yield response.follow(href, self.parse_details)
def next_group(self,response):
return scrapy.FormRequest.from_response(
response,
formdata={'DBVpage':'next'},
formname={'PDBquery'},
callback=self.parse,
)
def parse_details(self,response):
yield {
'owner_name': response.xpath('//td[contains(text(),"Owner Name")]/following::td[1]/text()').extract_first(),
'jurisdiction': response.xpath('//td[contains(text(),"Jurisdiction")]/following::td[1]/text()').extract_first(),
'property_street': response.xpath('//td[contains(text(),"Property Address")]/following::td[1]/div[1]/text()').extract_first(),
'property_csz': response.xpath('//td[contains(text(),"Property Address")]/following::td[1]/div[2]/text()').extract_first(),
'owner_street': response.xpath('//td[contains(text(),"Owner Address")]/following::td[1]/div[1]/text()').extract_first(),
'owner_csz': response.xpath('//td[contains(text(),"Owner Address")]/following::td[1]/div[2]/text()').extract_first(),
'current_tax_value': response.xpath('//td[contains(text(),"Current Taxable Value")]/following::td[1]/text()').extract_first(),
'school_district': response.xpath('//td[contains(text(),"School District")]/following::td[1]/text()').extract_first(),
'current_assess': response.xpath('//td[contains(text(),"Current Assessment")]/following::td[1]/text()').extract_first(),
'current_sev': response.xpath('//td[contains(text(),"Current S.E.V.")]/following::td[1]/text()').extract_first(),
'current_pre': response.xpath('//td[contains(text(),"Current P.R.E.")]/following::td[1]/text()').extract_first(),
'prop_class': response.xpath('//td[contains(text(),"Current Property Class")]/following::td[1]/text()').extract_first(),
'tax_desc': response.xpath('//h3[contains(text(),"Tax Description")]/following::div/text()').extract_first()
}
0 Votes
thriveni posted over 6 years ago Admin Best Answer
The next_group method needs to be called recursively to follow the "Next" Button. Currently its not being invoked in the spider code hence only first 25 records are retrieved. You can refer https://doc.scrapy.org/en/latest/intro/tutorial.html#following-links to know how to follow links in Scrapy.
0 Votes
2 Comments
Jacob Makowski posted over 6 years ago
Thanks, thriveni
I added the next_group method but am not getting more than 25 records. The next page link looks this way:
<a class="DBVpagelink" href="javascript:document.PDBquery.DBVpage.value='next';document.PDBquery.submit();">next ></a>
Which is why I was asking about using FormRequest. Thanks!
0 Votes
thriveni posted over 6 years ago Admin Answer
The next_group method needs to be called recursively to follow the "Next" Button. Currently its not being invoked in the spider code hence only first 25 records are retrieved. You can refer https://doc.scrapy.org/en/latest/intro/tutorial.html#following-links to know how to follow links in Scrapy.
0 Votes
Login to post a comment