为什么Scrapy不能抓取/解析？

这可能是一个重复的问题。我正在尝试运行Scrapy蜘蛛，但无法运行。为什么我会收到错误消息“HtmlResponse没有属性urljoin”？如果request_count是3并且response_count也是3，那么Scrapy统计数据暗示了什么？我的代码在这里。我希望在这个问题上有任何帮助。为什么Scrapy不能抓取/解析？

import scrapy 
from scrapy.http.request import Request 
from scrapy.spiders import BaseSpider 
from scrapy.selector import HtmlXPathSelector 

class BotSpider_2(BaseSpider): 
    name = 'BotSpider_2' 
    name = "google.co.th" 
    start_urls = ["http://www.google.co.th/"] 


    def parse(self, response): 
     sel = Selector(response) 
     sites = sel.xpath('//title/text()').extract() 
     print sites

来源

2016-09-28 Pavitra Atha

首先您的导入不正确。例如 - 为什么您使用BaseSpider而不是Spider？你也没有进口Selector。关于urljoin错误您描述越来越我没有看到您发布的代码抛出此错误; urljoin是自scrapy v1以来的响应对象的功能，它将当前的url与某些路径相结合，以创建可用于抓取的绝对url。

$ scrapy shell "https://scrapy.org" 
In [1]: response.url 
Out[1]: 'https://scrapy.org' 

In [2]: response.urljoin('/some/cool/path') 
Out[2]: 'https://scrapy.org/some/cool/path'

我已经清理了进口，你的代码工作就像一个魅力！

import scrapy 
from scrapy.selector import Selector 

class BotSpider_2(scrapy.Spider): 
    name = "google.co.th" 
    start_urls = ["http://www.google.co.th/"] 


    def parse(self, response): 
     sel = Selector(response) 
     sites = sel.xpath('//title/text()').extract() 
     print(sites)

来源

2016-09-28 06:10:41 Granitosaurus

为什么Scrapy不能抓取/解析？

回答

相关问题