我有一个链接:https://www.glassdoor.ca/Job/canada-data-jobs-SRCH_IL.0,6_IN3_KE7,11_IP1.htm我怎样才能增加链接
我想增加这样的链接:https://www.glassdoor.ca/Job/canada-data-jobs-SRCH_IL.0,6_IN3_KE7,11_IP2.htm
然后3,4,5 .... 我的代码是:
# -*- coding: utf-8 -*-
import scrapy
class GlassdoorSpider(scrapy.Spider):
name = 'glassdoor'
#allowed_domains = ['https://www.glassdoor.ca/Job/canada-data-jobs-SRCH_IL.0,6_IN3_KE7,11.htm']
start_urls = ['https://www.glassdoor.ca/Job/canada-data-jobs-SRCH_IL.0,6_IN3_KE7,11_IP1.htm']
def parse(self, response):
#main_url = "https://www.glassdoor.ca"
urls = response.css('li.jl > div > div.flexbox > div > a::attr(href)').extract()
for url in urls:
url = "https://www.glassdoor.ca" + url
yield scrapy.Request(url = url, callback = self.parse_details)
next_page_url = "https://www.glassdoor.ca/Job/canada-data-jobs-SRCH_IL.0,6_IN3_KE7,11_IP"
if next_page_url:
#next_page_url = response.urljoin(next_page_url)
yield scrapy.Request(url = next_page_url, callback = self.parse)
def parse_details(self,response):
yield{
'Job_Title' : response.css('div.header.cell.info > h2::text').extract()
}
self.log("reached22: "+ response.url)
我想增加它的可变next_page_url。
酷路的链接,但我知道这是不可能的,但每一页你的XPath查询是给我的相同的结果是:https://www.monster.ca/jobs/search/?q=data-analyst & page = 2。 即使是:https://www.monster.ca/jobs/search/?q=data-analyst&page=6 XPath是给链接,页码2.能否请你检查。 –
@AshishKapil你确定吗?它适用于我,在第6页,它给了我Scrapy shell中的[Out] [1]:u'https://www.monster.ca/jobs/search/?q = data-analyst&page = 7''。 –
你的查询是完美的,我想我有一个问题在我的最后,无论什么页面我给scrapy外壳,无论它只是加载第一页。 非常感谢再次托马斯:)) –