我是scrapy的新手,到目前为止我已经能够创建几个蜘蛛。我想写一个抓取Yellowpages的蜘蛛,寻找具有404响应的网站,蜘蛛工作正常,但是,分页不起作用。任何帮助都感激不尽。在此先感谢需要帮助YellowPages蜘蛛
# -*- coding: utf-8 -*-
import scrapy
class SpiderSpider(scrapy.Spider):
name = 'spider'
#allowed_domains = ['www.yellowpages.com']
start_urls = ['https://www.yellowpages.com/search?search_terms=handyman&geo_location_terms=Miami%2C+FL']
def parse(self, response):
for listing in response.css('div.search-results.organic div.srp-listing'):
url = listing.css('a.track-visit-website::attr(href)').extract_first()
yield scrapy.Request(url=url, callback=self.parse_details)
# follow pagination links
next_page_url = response.css('a.next.ajax-page::attr(href)').extract_first()
next_page_url = response.urljoin(next_page_url)
if next_page_url:
yield scrapy.Request(url=next_page_url, callback=self.parse)
def parse_details(self,response):
yield{'Response': response,}
嗨大卫,这是我在这里的第一次发帖,我是有格式的代码问题。我的问题很简单我有这个蜘蛛的分页问题。不知道我在这里错过什么 – oscarQ