使用Python 2.7.10版。试图通过运行这个蜘蛛从网页中提取数据。当我安装scrapy并在我的mac终端上运行它时,我能够获得最初的数据。但是现在我无法获取数据,而是收到Traceback错误。执行Scrapy时无法成功执行爬网,因为Scrapy在执行时
import scrapy
class ShopcluesSpider(scrapy.Spider):
name = 'shopclues'
allowed_domains = ['www.shopclues.com/mobiles-featured-store-4g-smartphone.html']
start_urls = ['http://www.shopclues.com/mobiles-featured-store-4g-smartphone.html/']
#custom_settings = {'FEED_URI' : 'tmp/shopclues.csv'}
def parse(self, response):
titles = response.css('img::attr(title)').extract()
#images = response.css('img::attr(data-img)').extract()
prices = response.css('.p_price::text').extract()
discounts = response.css('.prd_discount::text').extract()
for item in zip(titles,prices,discounts):
scraped_info = {
'title' : item[0],
'price' : item[1],
#'image_urls' : [item[2]], #Set's the url for scrapy to download images
'discount' : item[2]
}
yield scraped_info
得到了以下错误:
Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 11, in <module>
sys.exit(execute())
File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 148, in execute
cmd.crawler_process = CrawlerProcess(settings)
File "/Library/Python/2.7/site-packages/scrapy/crawler.py", line 243, in __init__
super(CrawlerProcess, self).__init__(settings)
File "/Library/Python/2.7/site-packages/scrapy/crawler.py", line 134, in __init__
self.spider_loader = _get_spider_loader(settings)
File "/Library/Python/2.7/site-packages/scrapy/crawler.py", line 330, in _get_spider_loader
return loader_cls.from_settings(settings.frozencopy())
File "/Library/Python/2.7/site-packages/scrapy/spiderloader.py", line 61, in from_settings
return cls(settings)
File "/Library/Python/2.7/site-packages/scrapy/spiderloader.py", line 25, in __init__
self._load_all_spiders()
File "/Library/Python/2.7/site-packages/scrapy/spiderloader.py", line 47, in _load_all_spiders
for module in walk_modules(name):
File "/Library/Python/2.7/site-packages/scrapy/utils/misc.py", line 71, in walk_modules
submod = import_module(fullpath)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
File "/Users/acetonemarketing/Documents/scrapy/ourfirstscraper/ourfirstscraper/spiders/shopclues.py", line 16
for item in zip(titles,prices,discounts):
^
IndentationError: unexpected indent
'IndentationError'与格式化你的源代码有关,Python使用缩进来构造源代码,所以它容易受到不好的缩进。但是,当我复制代码时,我没有任何问题。 –
感谢@TomášLinhart的回复。由于您没有遇到任何错误,这是否与我运行此蜘蛛的用户帐户有关?当我安装scrapy时,我不得不使用sudo -H pip安装scrapy来完成它。 –
它与您运行蜘蛛的用户帐户无关。源代码的缩进不好,但是从你发布的内容来看并不明显。 –