2017-01-25 34 views
0

我正在尝试使用scrapy splash以与scrapy相同的方式登录。 我查看了文档Doc,它说“SplashFormRequest.from_response也支持,并且按照scrapy文档中的描述运行” 但是,简单更改一行代码并更改splash文档中描述的设置并不会带来任何问题结果。我做错了什么? 代码:scrapy.FormRequest.from_response VS. SplashFormRequest.from_response

import scrapy 
from scrapy_splash import SplashRequest 

class MySpider(scrapy.Spider): 
    name = 'lost' 
    start_urls = ["myurl",] 

def parse(self, response): 
    return SplashFormRequest.from_response(
     response, 
     formdata={'username': 'pass', 'password': 'pass'}, 
     callback=self.after_login 
    ) 

def after_login(self, response): 
    print response.body 
    if "keyword" in response.body: 
     self.logger.error("Success") 
    else: 
     self.logger.error("Failed") 

加入到设置:

DOWNLOADER_MIDDLEWARES = { 
    'scrapy_splash.SplashCookiesMiddleware': 723, 
    'scrapy_splash.SplashMiddleware': 725, 
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware':  810, 
          } 

SPLASH_URL = 'http://localhost:8050' 
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter' 
HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage' 

错误日志:

[email protected]:~/Python/code/lostfilm$ scrapy crawl lost 
2017-01-26 20:24:22 [scrapy.utils.log] INFO: Scrapy 1.3.0 started (bot: lostfilm) 
2017-01-26 20:24:22 [scrapy.utils.log] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'lostfilm.spiders', 'ROBOTSTXT_OBEY': True, 'DUPEFILTER_CLASS': 'scrapy_splash.SplashAwareDupeFilter', 'SPIDER_MODULES': ['lostfilm.spiders'], 'BOT_NAME': 'lostfilm', 'HTTPCACHE_STORAGE': 'scrapy_splash.SplashAwareFSCacheStorage'} 
2017-01-26 20:24:22 [scrapy.middleware] INFO: Enabled extensions: 
['scrapy.extensions.logstats.LogStats', 
'scrapy.extensions.telnet.TelnetConsole', 
'scrapy.extensions.corestats.CoreStats'] 
Unhandled error in Deferred: 
2017-01-26 20:24:22 [twisted] CRITICAL: Unhandled error in Deferred: 

2017-01-26 20:24:22 [twisted] CRITICAL: 
Traceback (most recent call last): 
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1299, in _inlineCallbacks 
    result = g.send(result) 
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 90, in crawl 
six.reraise(*exc_info) 
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 72, in crawl 
self.engine = self._create_engine() 
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 97, in _create_engine 
return ExecutionEngine(self, lambda _: self.stop()) 
File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 69, in __init__ 
self.downloader = downloader_cls(crawler) 
File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/__init__.py", line 88, in __init__ 
self.middleware = DownloaderMiddlewareManager.from_crawler(crawler) 
File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 58, in from_crawler 
return cls.from_settings(crawler.settings, crawler) 
File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 34, in from_settings 
mwcls = load_object(clspath) 
File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/misc.py", line 49, in load_object 
raise NameError("Module '%s' doesn't define any object named '%s'" % (module, name)) 
NameError: Module 'scrapy.downloadermiddlewares.httpcompression' doesn't define any object named 'HttpCompresionMiddlerware' 
+0

错误信息显示的配置问题:有在中间件名称的错字 - 它应该是HttpCompressionMiddleware,不HttpCompresionMiddlerware。我不确定这会解决Splash的问题,但最好先解决与Splash无关的问题。 –

+1

我需要检查我的眼睛。这实际上使所有的工作。干杯,哥们! –

回答

0

你可能需要过执行与飞溅第一请求。

默认情况下,start_urls属性将发出“简单”scrapy.Request,而不是SplashRequest

你需要重写start_requests方法为您的蜘蛛:

class MySpider(scrapy.Spider): 
    name = 'lost' 
    start_urls = ["myurl",] 

    def start_requests(self): 
     for url in self.start_urls: 
      yield SplashRequest(url) 
    ... 
+0

谢谢,建议。但是,再次获得相同的错误。更新后。 –

相关问题