第一次运行后Scrapy'twisted.internet.error.ReactorNotRestartable'错误

我正在使用CrawlerProcess从脚本运行Scrapy （版本1.4.0）。网址来自用户输入。第一次运行良好，但在第二次，它给出了twisted.internet.error.ReactorNotRestartable错误。所以，程序停留在那里。第一次运行后Scrapy'twisted.internet.error.ReactorNotRestartable'错误

履带工艺段：

process = CrawlerProcess({ 
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)' 
}) 

process.crawl(GeneralSpider) 

print('~~~~~~~~~~~~ Processing is going to be started ~~~~~~~~~~') 
process.start() 
print('~~~~~~~~~~~~ Processing ended ~~~~~~~~~~') 
process.stop()

这里是第一次运行的输出：

~~~~~~~~~~~~ Processing is going to be started ~~~~~~~~~~ 
2017-07-17 05:58:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.some-url.com/content.php> (referer: None) 
2017-07-17 05:58:46 [scrapy.core.scraper] ERROR: Spider must return Request, BaseItem, dict or None, got 'HtmlResponse' in <GET http://www.some-url.com/content.php> 
2017-07-17 05:58:46 [scrapy.core.engine] INFO: Closing spider (finished) 
2017-07-17 05:58:46 [scrapy.statscollectors] INFO: Dumping Scrapy stats: 
{'downloader/request_bytes': 261, 
'downloader/request_count': 1, 
'downloader/request_method_count/GET': 1, 
'downloader/response_bytes': 14223, 
'downloader/response_count': 1, 
'downloader/response_status_count/200': 1, 
'finish_reason': 'finished', 
'finish_time': datetime.datetime(2017, 7, 17, 5, 58, 46, 760661), 
'log_count/DEBUG': 2, 
'log_count/ERROR': 1, 
'log_count/INFO': 7, 
'memusage/max': 49983488, 
'memusage/startup': 49983488, 
'response_received_count': 1, 
'scheduler/dequeued': 1, 
'scheduler/dequeued/memory': 1, 
'scheduler/enqueued': 1, 
'scheduler/enqueued/memory': 1, 
'start_time': datetime.datetime(2017, 7, 17, 5, 58, 45, 162155)} 
2017-07-17 05:58:46 [scrapy.core.engine] INFO: Spider closed (finished) 
~~~~~~~~~~~~ Processing ended ~~~~~~~~~~

当我尝试运行第二次，它会引发错误：

~~~~~~~~~~~~ Processing is going to be started ~~~~~~~~~~ 
[2017-07-17 06:03:18,075] ERROR in app: Exception on /scripts/1/process [GET] 
Traceback (most recent call last): 
    File "/var/www/python/crawlerapp/appenv/lib/python3.5/site-packages/flask/app.py", line 1982, in wsgi_app 
    response = self.full_dispatch_request() 
    File "/var/www/python/crawlerapp/appenv/lib/python3.5/site-packages/flask/app.py", line 1614, in full_dispatch_request 
    rv = self.handle_user_exception(e) 
    File "/var/www/python/crawlerapp/appenv/lib/python3.5/site-packages/flask/app.py", line 1517, in handle_user_exception 
    reraise(exc_type, exc_value, tb) 
    File "/var/www/python/crawlerapp/appenv/lib/python3.5/site-packages/flask/_compat.py", line 33, in reraise 
    raise value 
    File "/var/www/python/crawlerapp/appenv/lib/python3.5/site-packages/flask/app.py", line 1612, in full_dispatch_request 
    rv = self.dispatch_request() 
    File "/var/www/python/crawlerapp/appenv/lib/python3.5/site-packages/flask/app.py", line 1598, in dispatch_request 
    return self.view_functions[rule.endpoint](**req.view_args) 
    File "api.py", line 13, in process_crawler 
    processor.process() 
    File "/var/www/python/crawlerapp/application/scripts/general_spider.py", line 124, in process 
    process.start() 
    File "/var/www/python/crawlerapp/appenv/lib/python3.5/site-packages/scrapy/crawler.py", line 285, in start 
    reactor.run(installSignalHandlers=False) # blocking call 
    File "/var/www/python/crawlerapp/appenv/lib/python3.5/site-packages/twisted/internet/base.py", line 1242, in run 
    self.startRunning(installSignalHandlers=installSignalHandlers) 
    File "/var/www/python/crawlerapp/appenv/lib/python3.5/site-packages/twisted/internet/base.py", line 1222, in startRunning 
    ReactorBase.startRunning(self) 
    File "/var/www/python/crawlerapp/appenv/lib/python3.5/site-packages/twisted/internet/base.py", line 730, in startRunning 
    raise error.ReactorNotRestartable() 
twisted.internet.error.ReactorNotRestartable

如何在每个过程完成后重启反应器或停止反应器？

堆栈溢出中有一些类似的问题，但有解决方案适用于旧版本的Scrapy。无法使用这些解决方案。

来源

2017-07-17 Sovon

的可能的复制[Scrapy - 电抗器不重新启动的（HTTPS ：//stackoverflow.com/questions/41495052/scrapy-reactor-not-restartable） –

您可以添加此行。

process.start(stop_after_crawl=False)

希望如此，您的问题将是解决

感谢

来源

2017-07-17 07:24:14

试过了。但它卡在那里。进程不停止并继续运行。 – Sovon

发生这种情况，当你尝试在运行过程中运行两次反应器。 process.start（）启动反应器。

请分享你是如何考虑用户输入，并将其传递给蜘蛛

来源

2017-07-17 16:15:28 user7961189

尝试启动功能在一个单独的进程：

def crawl(): 
    pass 

Process(target=crawl).start()

来源

2017-09-12 07:54:55

第一次运行后Scrapy'twisted.internet.error.ReactorNotRestartable'错误

回答

相关问题