Pagination of ASP site with formrequest for next page

Posted over 6 years ago by Jacob Makowski

Post a topic
Answered
J
Jacob Makowski

I'm having trouble scraping this page: http://maps.kalkaskacounty.net/propertysearch.asp?PDBsearch=setdo  


My scraper gets all of the links the sub pages and scrapes those correctly (25 results), but isn't correctly submitting the form request to get the next 25 results to scrape (and so on). I would appreciate any help anyone can offer. Thanks! 


 

import scrapy


class ParcelScraperSpider(scrapy.Spider):

name = 'parcel_scraper'

start_urls = ['http://maps.kalkaskacounty.net/propertysearch.asp?PDBsearch=setdo',

'http://maps.kalkaskacounty.net/,']


def parse(self,response):

for href in response.css('a.PDBlistlink::attr(href)'):

yield response.follow(href, self.parse_details)


def next_group(self,response):

return scrapy.FormRequest.from_response(

response,

formdata={'DBVpage':'next'},

formname={'PDBquery'},

callback=self.parse,

)

 


def parse_details(self,response):

yield {

'owner_name': response.xpath('//td[contains(text(),"Owner Name")]/following::td[1]/text()').extract_first(),

'jurisdiction': response.xpath('//td[contains(text(),"Jurisdiction")]/following::td[1]/text()').extract_first(),

'property_street': response.xpath('//td[contains(text(),"Property Address")]/following::td[1]/div[1]/text()').extract_first(),

'property_csz': response.xpath('//td[contains(text(),"Property Address")]/following::td[1]/div[2]/text()').extract_first(),

'owner_street': response.xpath('//td[contains(text(),"Owner Address")]/following::td[1]/div[1]/text()').extract_first(),

'owner_csz': response.xpath('//td[contains(text(),"Owner Address")]/following::td[1]/div[2]/text()').extract_first(),

'current_tax_value': response.xpath('//td[contains(text(),"Current Taxable Value")]/following::td[1]/text()').extract_first(),

'school_district': response.xpath('//td[contains(text(),"School District")]/following::td[1]/text()').extract_first(),

'current_assess': response.xpath('//td[contains(text(),"Current Assessment")]/following::td[1]/text()').extract_first(),

'current_sev': response.xpath('//td[contains(text(),"Current S.E.V.")]/following::td[1]/text()').extract_first(),

'current_pre': response.xpath('//td[contains(text(),"Current P.R.E.")]/following::td[1]/text()').extract_first(),

'prop_class': response.xpath('//td[contains(text(),"Current Property Class")]/following::td[1]/text()').extract_first(),

'tax_desc': response.xpath('//h3[contains(text(),"Tax Description")]/following::div/text()').extract_first()

}



 

0 Votes

thriveni

thriveni posted over 6 years ago Admin Best Answer

The next_group method needs to be called recursively to follow the "Next" Button. Currently its not being invoked in the spider code hence only first 25 records are retrieved. You can refer https://doc.scrapy.org/en/latest/intro/tutorial.html#following-links to know how to follow links in Scrapy.

0 Votes


2 Comments

Sorted by
J

Jacob Makowski posted over 6 years ago

Thanks, thriveni


I added the next_group method but am not getting more than 25 records. The next page link looks this way:


<a class="DBVpagelink" href="javascript:document.PDBquery.DBVpage.value='next';document.PDBquery.submit();">next &gt;</a>


Which is why I was asking about using FormRequest. Thanks!

0 Votes

thriveni

thriveni posted over 6 years ago Admin Answer

The next_group method needs to be called recursively to follow the "Next" Button. Currently its not being invoked in the spider code hence only first 25 records are retrieved. You can refer https://doc.scrapy.org/en/latest/intro/tutorial.html#following-links to know how to follow links in Scrapy.

0 Votes

Login to post a comment