My scraper gets all of the links the sub pages and scrapes those correctly (25 results), but isn't correctly submitting the form request to get the next 25 results to scrape (and so on). I would appreciate any help anyone can offer. Thanks!
The next_group method needs to be called recursively to follow the "Next" Button. Currently its not being invoked in the spider code hence only first 25 records are retrieved. You can refer https://doc.scrapy.org/en/latest/intro/tutorial.html#following-links to know how to follow links in Scrapy.
The next_group method needs to be called recursively to follow the "Next" Button. Currently its not being invoked in the spider code hence only first 25 records are retrieved. You can refer https://doc.scrapy.org/en/latest/intro/tutorial.html#following-links to know how to follow links in Scrapy.
J
Jacob Makowski
said
over 5 years ago
Thanks, thriveni
I added the next_group method but am not getting more than 25 records. The next page link looks this way:
Jacob Makowski
I'm having trouble scraping this page: http://maps.kalkaskacounty.net/propertysearch.asp?PDBsearch=setdo
My scraper gets all of the links the sub pages and scrapes those correctly (25 results), but isn't correctly submitting the form request to get the next 25 results to scrape (and so on). I would appreciate any help anyone can offer. Thanks!
class ParcelScraperSpider(scrapy.Spider):
name = 'parcel_scraper'
start_urls = ['http://maps.kalkaskacounty.net/propertysearch.asp?PDBsearch=setdo',
'http://maps.kalkaskacounty.net/,']
def parse(self,response):
for href in response.css('a.PDBlistlink::attr(href)'):
yield response.follow(href, self.parse_details)
def next_group(self,response):
return scrapy.FormRequest.from_response(
response,
formdata={'DBVpage':'next'},
formname={'PDBquery'},
callback=self.parse,
)
def parse_details(self,response):
yield {
'owner_name': response.xpath('//td[contains(text(),"Owner Name")]/following::td[1]/text()').extract_first(),
'jurisdiction': response.xpath('//td[contains(text(),"Jurisdiction")]/following::td[1]/text()').extract_first(),
'property_street': response.xpath('//td[contains(text(),"Property Address")]/following::td[1]/div[1]/text()').extract_first(),
'property_csz': response.xpath('//td[contains(text(),"Property Address")]/following::td[1]/div[2]/text()').extract_first(),
'owner_street': response.xpath('//td[contains(text(),"Owner Address")]/following::td[1]/div[1]/text()').extract_first(),
'owner_csz': response.xpath('//td[contains(text(),"Owner Address")]/following::td[1]/div[2]/text()').extract_first(),
'current_tax_value': response.xpath('//td[contains(text(),"Current Taxable Value")]/following::td[1]/text()').extract_first(),
'school_district': response.xpath('//td[contains(text(),"School District")]/following::td[1]/text()').extract_first(),
'current_assess': response.xpath('//td[contains(text(),"Current Assessment")]/following::td[1]/text()').extract_first(),
'current_sev': response.xpath('//td[contains(text(),"Current S.E.V.")]/following::td[1]/text()').extract_first(),
'current_pre': response.xpath('//td[contains(text(),"Current P.R.E.")]/following::td[1]/text()').extract_first(),
'prop_class': response.xpath('//td[contains(text(),"Current Property Class")]/following::td[1]/text()').extract_first(),
'tax_desc': response.xpath('//h3[contains(text(),"Tax Description")]/following::div/text()').extract_first()
}
The next_group method needs to be called recursively to follow the "Next" Button. Currently its not being invoked in the spider code hence only first 25 records are retrieved. You can refer https://doc.scrapy.org/en/latest/intro/tutorial.html#following-links to know how to follow links in Scrapy.
- Oldest First
- Popular
- Newest First
Sorted by Oldest Firstthriveni
The next_group method needs to be called recursively to follow the "Next" Button. Currently its not being invoked in the spider code hence only first 25 records are retrieved. You can refer https://doc.scrapy.org/en/latest/intro/tutorial.html#following-links to know how to follow links in Scrapy.
Jacob Makowski
Thanks, thriveni
I added the next_group method but am not getting more than 25 records. The next page link looks this way:
<a class="DBVpagelink" href="javascript:document.PDBquery.DBVpage.value='next';document.PDBquery.submit();">next ></a>
Which is why I was asking about using FormRequest. Thanks!
-
Unable to select Scrapy project in GitHub
-
ScrapyCloud can't call spider?
-
Unhandled error in Deferred
-
Item API - Filtering
-
newbie to web scraping but need data from zillow
-
ValueError: Invalid control character
-
Cancelling account
-
Best Practices
-
Beautifulsoup with ScrapingHub
-
Delete a project in ScrapingHub
See all 446 topics