Start a new topic
Answered

Pagination of ASP site with formrequest for next page

I'm having trouble scraping this page: http://maps.kalkaskacounty.net/propertysearch.asp?PDBsearch=setdo  


My scraper gets all of the links the sub pages and scrapes those correctly (25 results), but isn't correctly submitting the form request to get the next 25 results to scrape (and so on). I would appreciate any help anyone can offer. Thanks! 


 

import scrapy


class ParcelScraperSpider(scrapy.Spider):

name = 'parcel_scraper'

start_urls = ['http://maps.kalkaskacounty.net/propertysearch.asp?PDBsearch=setdo',

'http://maps.kalkaskacounty.net/,']


def parse(self,response):

for href in response.css('a.PDBlistlink::attr(href)'):

yield response.follow(href, self.parse_details)


def next_group(self,response):

return scrapy.FormRequest.from_response(

response,

formdata={'DBVpage':'next'},

formname={'PDBquery'},

callback=self.parse,

)

 


def parse_details(self,response):

yield {

'owner_name': response.xpath('//td[contains(text(),"Owner Name")]/following::td[1]/text()').extract_first(),

'jurisdiction': response.xpath('//td[contains(text(),"Jurisdiction")]/following::td[1]/text()').extract_first(),

'property_street': response.xpath('//td[contains(text(),"Property Address")]/following::td[1]/div[1]/text()').extract_first(),

'property_csz': response.xpath('//td[contains(text(),"Property Address")]/following::td[1]/div[2]/text()').extract_first(),

'owner_street': response.xpath('//td[contains(text(),"Owner Address")]/following::td[1]/div[1]/text()').extract_first(),

'owner_csz': response.xpath('//td[contains(text(),"Owner Address")]/following::td[1]/div[2]/text()').extract_first(),

'current_tax_value': response.xpath('//td[contains(text(),"Current Taxable Value")]/following::td[1]/text()').extract_first(),

'school_district': response.xpath('//td[contains(text(),"School District")]/following::td[1]/text()').extract_first(),

'current_assess': response.xpath('//td[contains(text(),"Current Assessment")]/following::td[1]/text()').extract_first(),

'current_sev': response.xpath('//td[contains(text(),"Current S.E.V.")]/following::td[1]/text()').extract_first(),

'current_pre': response.xpath('//td[contains(text(),"Current P.R.E.")]/following::td[1]/text()').extract_first(),

'prop_class': response.xpath('//td[contains(text(),"Current Property Class")]/following::td[1]/text()').extract_first(),

'tax_desc': response.xpath('//h3[contains(text(),"Tax Description")]/following::div/text()').extract_first()

}



 


Best Answer

The next_group method needs to be called recursively to follow the "Next" Button. Currently its not being invoked in the spider code hence only first 25 records are retrieved. You can refer https://doc.scrapy.org/en/latest/intro/tutorial.html#following-links to know how to follow links in Scrapy.


Thanks, thriveni


I added the next_group method but am not getting more than 25 records. The next page link looks this way:


<a class="DBVpagelink" href="javascript:document.PDBquery.DBVpage.value='next';document.PDBquery.submit();">next &gt;</a>


Which is why I was asking about using FormRequest. Thanks!

Answer

The next_group method needs to be called recursively to follow the "Next" Button. Currently its not being invoked in the spider code hence only first 25 records are retrieved. You can refer https://doc.scrapy.org/en/latest/intro/tutorial.html#following-links to know how to follow links in Scrapy.

Login to post a comment