2017-10-09 75 views
-1

使用Python 2.7.10版。试图通过运行这个蜘蛛从网页中提取数据。当我安装scrapy并在我的mac终端上运行它时,我能够获得最初的数据。但是现在我无法获取数据,而是收到Traceback错误。执行Scrapy时无法成功执行爬网,因为Scrapy在执行时

import scrapy 

class ShopcluesSpider(scrapy.Spider): 

    name = 'shopclues' 
    allowed_domains = ['www.shopclues.com/mobiles-featured-store-4g-smartphone.html'] 
    start_urls = ['http://www.shopclues.com/mobiles-featured-store-4g-smartphone.html/'] 
    #custom_settings = {'FEED_URI' : 'tmp/shopclues.csv'} 

    def parse(self, response): 
     titles = response.css('img::attr(title)').extract() 
     #images = response.css('img::attr(data-img)').extract() 
     prices = response.css('.p_price::text').extract() 
     discounts = response.css('.prd_discount::text').extract() 

     for item in zip(titles,prices,discounts): 
      scraped_info = { 
      'title' : item[0], 
      'price' : item[1], 
      #'image_urls' : [item[2]], #Set's the url for scrapy to download images 
      'discount' : item[2] 
      } 

      yield scraped_info 

得到了以下错误:

Traceback (most recent call last): 
    File "/usr/local/bin/scrapy", line 11, in <module> 
    sys.exit(execute()) 
    File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 148, in execute 
    cmd.crawler_process = CrawlerProcess(settings) 


File "/Library/Python/2.7/site-packages/scrapy/crawler.py", line 243, in __init__ 
    super(CrawlerProcess, self).__init__(settings) 
    File "/Library/Python/2.7/site-packages/scrapy/crawler.py", line 134, in __init__ 
    self.spider_loader = _get_spider_loader(settings) 
    File "/Library/Python/2.7/site-packages/scrapy/crawler.py", line 330, in _get_spider_loader 
    return loader_cls.from_settings(settings.frozencopy()) 
    File "/Library/Python/2.7/site-packages/scrapy/spiderloader.py", line 61, in from_settings 
    return cls(settings) 
    File "/Library/Python/2.7/site-packages/scrapy/spiderloader.py", line 25, in __init__ 
    self._load_all_spiders() 
    File "/Library/Python/2.7/site-packages/scrapy/spiderloader.py", line 47, in _load_all_spiders 
    for module in walk_modules(name): 
    File "/Library/Python/2.7/site-packages/scrapy/utils/misc.py", line 71, in walk_modules 
    submod = import_module(fullpath) 
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/importlib/__init__.py", line 37, in import_module 
    __import__(name) 
    File "/Users/acetonemarketing/Documents/scrapy/ourfirstscraper/ourfirstscraper/spiders/shopclues.py", line 16 
    for item in zip(titles,prices,discounts): 
    ^
IndentationError: unexpected indent 
+0

'IndentationError'与格式化你的源代码有关,Python使用缩进来构造源代码,所以它容易受到不好的缩进。但是,当我复制代码时,我没有任何问题。 –

+0

感谢@TomášLinhart的回复。由于您没有遇到任何错误,这是否与我运行此蜘蛛的用户帐户有关?当我安装scrapy时,我不得不使用sudo -H pip安装scrapy来完成它。 –

+0

它与您运行蜘蛛的用户帐户无关。源代码的缩进不好,但是从你发布的内容来看并不明显。 –

回答

0
根据您提供的源代码

,问题是你有混合空格和制表符的文件中。如前所述,Python对此非常敏感,一般而言,每个缩进级别只应使用4个空格(由PEP8推荐)。

具体而言,请删除for循环的标签,并用相应数量的空格替换它们以缩进。

+0

感谢您的答复。如果我在每行中使用四个空格,是不是会扭曲代码所具有的嵌套结构? –

+0

我的意思是每个缩进级别使用4个空格,而不是每行使用4个空格。即'class ShopcluesSpider ...'开始行,'def解析...'缩进4个空格等。 –

+0

感谢您的澄清。现在至少由于缩进导致的错误得到解决。 –