我正在尝试使用scrapy splash以与scrapy相同的方式登录。 我查看了文档Doc,它说“SplashFormRequest.from_response也支持,并且按照scrapy文档中的描述运行” 但是,简单更改一行代码并更改splash文档中描述的设置并不会带来任何问题结果。我做错了什么? 代码:scrapy.FormRequest.from_response VS. SplashFormRequest.from_response
import scrapy
from scrapy_splash import SplashRequest
class MySpider(scrapy.Spider):
name = 'lost'
start_urls = ["myurl",]
def parse(self, response):
return SplashFormRequest.from_response(
response,
formdata={'username': 'pass', 'password': 'pass'},
callback=self.after_login
)
def after_login(self, response):
print response.body
if "keyword" in response.body:
self.logger.error("Success")
else:
self.logger.error("Failed")
加入到设置:
DOWNLOADER_MIDDLEWARES = { 'scrapy_splash.SplashCookiesMiddleware': 723, 'scrapy_splash.SplashMiddleware': 725, 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810, } SPLASH_URL = 'http://localhost:8050' DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter' HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'
错误日志:
[email protected]:~/Python/code/lostfilm$ scrapy crawl lost
2017-01-26 20:24:22 [scrapy.utils.log] INFO: Scrapy 1.3.0 started (bot: lostfilm)
2017-01-26 20:24:22 [scrapy.utils.log] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'lostfilm.spiders', 'ROBOTSTXT_OBEY': True, 'DUPEFILTER_CLASS': 'scrapy_splash.SplashAwareDupeFilter', 'SPIDER_MODULES': ['lostfilm.spiders'], 'BOT_NAME': 'lostfilm', 'HTTPCACHE_STORAGE': 'scrapy_splash.SplashAwareFSCacheStorage'}
2017-01-26 20:24:22 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
Unhandled error in Deferred:
2017-01-26 20:24:22 [twisted] CRITICAL: Unhandled error in Deferred:
2017-01-26 20:24:22 [twisted] CRITICAL:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1299, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 90, in crawl
six.reraise(*exc_info)
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 72, in crawl
self.engine = self._create_engine()
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 97, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 69, in __init__
self.downloader = downloader_cls(crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/__init__.py", line 88, in __init__
self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 58, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 34, in from_settings
mwcls = load_object(clspath)
File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/misc.py", line 49, in load_object
raise NameError("Module '%s' doesn't define any object named '%s'" % (module, name))
NameError: Module 'scrapy.downloadermiddlewares.httpcompression' doesn't define any object named 'HttpCompresionMiddlerware'
错误信息显示的配置问题:有在中间件名称的错字 - 它应该是HttpCompressionMiddleware,不HttpCompresionMiddlerware。我不确定这会解决Splash的问题,但最好先解决与Splash无关的问题。 –
我需要检查我的眼睛。这实际上使所有的工作。干杯,哥们! –