2017-07-04 187 views
1

当我试图记录使用Frontera和scrapy抓取时它给出了一个错误,说没有模块命名记录,但是,我无法理解为什么它会出现我遵循了从official link录制的步骤。 请帮助,并感谢你的一样。 回溯是:没有模块命名记录,而试图记录scrapy抓取

2017-07-04 15:38:57 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: alexa) 
2017-07-04 15:38:57 [scrapy.utils.log] INFO: Overridden settings: {'AUTOTHROTTLE_MAX_DELAY': 3.0, 'DOWNLOAD_MAXSIZE': 10485760, 'SPIDER_MODULES': ['alexa.spiders'], 'CONCURRENT_REQUESTS_PER_DOMAIN': 10, 'CONCURRENT_REQUESTS': 256, 'RANDOMIZE_DOWNLOAD_DELAY': False, 'RETRY_ENABLED': False, 'DUPEFILTER_CLASS': 'alexa.bloom_filter1.BLOOMDupeFilter', 'AUTOTHROTTLE_START_DELAY': 0.25, 'REACTOR_THREADPOOL_MAXSIZE': 20, 'BOT_NAME': 'alexa', 'AJAXCRAWL_ENABLED': True, 'COOKIES_ENABLED': False, 'SCHEDULER': 'frontera.contrib.scrapy.schedulers.frontier.FronteraScheduler', 'DOWNLOAD_TIMEOUT': 120, 'AUTOTHROTTLE_ENABLED': True, 'NEWSPIDER_MODULE': 'alexa.spiders'} 
2017-07-04 15:38:57 [scrapy.middleware] INFO: Enabled extensions: 
['scrapy.extensions.memusage.MemoryUsage', 
'scrapy.extensions.logstats.LogStats', 
'scrapy.extensions.telnet.TelnetConsole', 
'scrapy.extensions.corestats.CoreStats', 
'scrapy.extensions.throttle.AutoThrottle'] 
Unhandled error in Deferred: 
2017-07-04 15:38:57 [twisted] CRITICAL: Unhandled error in Deferred: 

2017-07-04 15:38:57 [twisted] CRITICAL: 
Traceback (most recent call last): 
    File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks 
    result = g.send(result) 
    File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/crawler.py", line 95, in crawl 
    six.reraise(*exc_info) 
    File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/crawler.py", line 77, in crawl 
    self.engine = self._create_engine() 
    File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/crawler.py", line 102, in _create_engine 
    return ExecutionEngine(self, lambda _: self.stop()) 
    File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/core/engine.py", line 69, in __init__ 
    self.downloader = downloader_cls(crawler) 
    File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/core/downloader/__init__.py", line 88, in __init__ 
    self.middleware = DownloaderMiddlewareManager.from_crawler(crawler) 
    File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/middleware.py", line 58, in from_crawler 
    return cls.from_settings(crawler.settings, crawler) 
    File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/middleware.py", line 34, in from_settings 
    mwcls = load_object(clspath) 
    File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/utils/misc.py", line 44, in load_object 
    mod = import_module(module) 
    File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module 
    __import__(name) 
ImportError: No module named recording 
+0

我建议你在[frontera on GitHub](https://github.com/scrapinghub/frontera/issues/new)上打开一个问题。 –

回答

0

我跟随official doc有同样的问题。我在scrapinghub blogpost后发现了一个solutiobn。

问题是,官方文档已弃用。它使用不存在了中间件:

SPIDER_MIDDLEWARES.update({ 
    'frontera.contrib.scrapy.middlewares.recording.CrawlRecorderSpiderMiddleware': 1000, 
}) 

DOWNLOADER_MIDDLEWARES.update({ 
'frontera.contrib.scrapy.middlewares.recording.CrawlRecorderDownloaderMiddleware': 1000, 

}) 

除了使用recording中间件的,你需要使用一个scheduler

SPIDER_MIDDLEWARES.update({ 
'frontera.contrib.scrapy.middlewares.schedulers.SchedulerSpiderMiddleware': 1000, 
}) 

DOWNLOADER_MIDDLEWARES.update({ 
    'frontera.contrib.scrapy.middlewares.schedulers.SchedulerDownloaderMiddleware': 1000, 
})