Pagination of ASP site with formrequest for next page

Posted about 7 years ago by Jacob Makowski

Post a topic

Answered

Jacob Makowski

I'm having trouble scraping this page: http://maps.kalkaskacounty.net/propertysearch.asp?PDBsearch=setdo

My scraper gets all of the links the sub pages and scrapes those correctly (25 results), but isn't correctly submitting the form request to get the next 25 results to scrape (and so on). I would appreciate any help anyone can offer. Thanks!

import scrapy

class ParcelScraperSpider(scrapy.Spider):

name = 'parcel_scraper'

start_urls = ['http://maps.kalkaskacounty.net/propertysearch.asp?PDBsearch=setdo',

'http://maps.kalkaskacounty.net/,']

def parse(self,response):

for href in response.css('a.PDBlistlink::attr(href)'):

yield response.follow(href, self.parse_details)

def next_group(self,response):

return scrapy.FormRequest.from_response(

response,

formdata={'DBVpage':'next'},

formname={'PDBquery'},

callback=self.parse,

)

def parse_details(self,response):

yield {

'owner_name': response.xpath('//td[contains(text(),"Owner Name")]/following::td[1]/text()').extract_first(),

'jurisdiction': response.xpath('//td[contains(text(),"Jurisdiction")]/following::td[1]/text()').extract_first(),

'property_street': response.xpath('//td[contains(text(),"Property Address")]/following::td[1]/div[1]/text()').extract_first(),

'property_csz': response.xpath('//td[contains(text(),"Property Address")]/following::td[1]/div[2]/text()').extract_first(),

'owner_street': response.xpath('//td[contains(text(),"Owner Address")]/following::td[1]/div[1]/text()').extract_first(),

'owner_csz': response.xpath('//td[contains(text(),"Owner Address")]/following::td[1]/div[2]/text()').extract_first(),

'current_tax_value': response.xpath('//td[contains(text(),"Current Taxable Value")]/following::td[1]/text()').extract_first(),

'school_district': response.xpath('//td[contains(text(),"School District")]/following::td[1]/text()').extract_first(),

'current_assess': response.xpath('//td[contains(text(),"Current Assessment")]/following::td[1]/text()').extract_first(),

'current_sev': response.xpath('//td[contains(text(),"Current S.E.V.")]/following::td[1]/text()').extract_first(),

'current_pre': response.xpath('//td[contains(text(),"Current P.R.E.")]/following::td[1]/text()').extract_first(),

'prop_class': response.xpath('//td[contains(text(),"Current Property Class")]/following::td[1]/text()').extract_first(),

'tax_desc': response.xpath('//h3[contains(text(),"Tax Description")]/following::div/text()').extract_first()

}

0 Votes

thriveni posted about 7 years ago Admin Best Answer

The next_group method needs to be called recursively to follow the "Next" Button. Currently its not being invoked in the spider code hence only first 25 records are retrieved. You can refer https://doc.scrapy.org/en/latest/intro/tutorial.html#following-links to know how to follow links in Scrapy.

0 Votes

2 Comments

Jacob Makowski posted about 7 years ago

Thanks, thriveni

I added the next_group method but am not getting more than 25 records. The next page link looks this way:

Which is why I was asking about using FormRequest. Thanks!

0 Votes

thriveni posted about 7 years ago Admin Answer

0 Votes