2017-08-06 79 views
1

我想使用协程来抓取和解析网页。我写了一个样本和测试。该程序可以在Ubuntu 16.04的python 3.5中运行良好,当所有作品完成后它将退出。源代码如下。为什么BeautifulSoup与'任务异常从未检索'相关?

import aiohttp 
import asyncio 
from bs4 import BeautifulSoup 

async def coro(): 
    coro_loop = asyncio.get_event_loop() 
    url = u'https://www.python.org/' 
    for _ in range(4): 
     async with aiohttp.ClientSession(loop=coro_loop) as coro_session: 
      with aiohttp.Timeout(30, loop=coro_session.loop): 
       async with coro_session.get(url) as resp: 
        print('get response from url: %s' % url) 
        source_code = await resp.read() 
        soup = BeautifulSoup(source_code, 'lxml') 

def main(): 
    loop = asyncio.get_event_loop() 
    worker = loop.create_task(coro()) 
    try: 
     loop.run_until_complete(worker) 
    except KeyboardInterrupt: 
     print ('keyboard interrupt') 
     worker.cancel() 
    finally: 
     loop.stop() 
     loop.run_forever() 
     loop.close() 

if __name__ == '__main__': 
    main() 

测试时,我发现,当我通过按“Ctrl + C”关闭程序时,会出现一个错误“任务异常从来没有检索到的”。

^Ckeyboard interrupt 
Task exception was never retrieved 
future: <Task finished coro=<coro() done, defined at ./test.py:8> exception=KeyboardInterrupt()> 
Traceback (most recent call last): 
    File "./test.py", line 23, in main 
    loop.run_until_complete(worker) 
    File "/usr/lib/python3.5/asyncio/base_events.py", line 375, in run_until_complete 
    self.run_forever() 
    File "/usr/lib/python3.5/asyncio/base_events.py", line 345, in run_forever 
    self._run_once() 
    File "/usr/lib/python3.5/asyncio/base_events.py", line 1312, in _run_once 
    handle._run() 
    File "/usr/lib/python3.5/asyncio/events.py", line 125, in _run 
    self._callback(*self._args) 
    File "/usr/lib/python3.5/asyncio/tasks.py", line 307, in _wakeup 
    self._step() 
    File "/usr/lib/python3.5/asyncio/tasks.py", line 239, in _step 
    result = coro.send(None) 
    File "./test.py", line 17, in coro 
    soup = BeautifulSoup(source_code, 'lxml') 
    File "/usr/lib/python3/dist-packages/bs4/__init__.py", line 215, in __init__ 
    self._feed() 
    File "/usr/lib/python3/dist-packages/bs4/__init__.py", line 239, in _feed 
    self.builder.feed(self.markup) 
    File "/usr/lib/python3/dist-packages/bs4/builder/_lxml.py", line 240, in feed 
    self.parser.feed(markup) 
    File "src/lxml/parser.pxi", line 1194, in lxml.etree._FeedParser.feed (src/lxml/lxml.etree.c:119773) 
    File "src/lxml/parser.pxi", line 1316, in lxml.etree._FeedParser.feed (src/lxml/lxml.etree.c:119644) 
    File "src/lxml/parsertarget.pxi", line 141, in lxml.etree._TargetParserContext._handleParseResult (src/lxml/lxml.etree.c:137264) 
    File "src/lxml/parsertarget.pxi", line 135, in lxml.etree._TargetParserContext._handleParseResult (src/lxml/lxml.etree.c:137128) 
    File "src/lxml/lxml.etree.pyx", line 324, in lxml.etree._ExceptionContext._raise_if_stored (src/lxml/lxml.etree.c:11090) 
    File "src/lxml/saxparser.pxi", line 499, in lxml.etree._handleSaxData (src/lxml/lxml.etree.c:131013) 
    File "src/lxml/parsertarget.pxi", line 88, in lxml.etree._PythonSaxParserTarget._handleSaxData (src/lxml/lxml.etree.c:136397) 
    File "/usr/lib/python3/dist-packages/bs4/builder/_lxml.py", line 206, in data 
    def data(self, content): 
KeyboardInterrupt 

我通过the offical docs of python看了一下,但没有得到任何线索。我尝试在coro()中捕获键盘中断。

try: 
    soup = BeautifulSoup(source_code, 'lxml') 
except KeyboardInterrupt: 
    print ('capture exception') 
    raise 

每当BeautifulSoup()捕获KeyboardInterrupt时,'try/except'都会发生错误。看起来BeautifulSoup会导致错误。但如何解决它?

+1

这有什么好做BeautifulSoup。当您不检索任务中引发的异常时,会发生此警告。您需要在某处添加对'worker.exception()的调用。 – dirn

回答

2

当你拨打task.cancel()这个功能实际上并不取消任务,它只是“标记”任务被取消。当任务恢复执行时,将开始取消任务的实际过程。 asyncio.CancelledError将在任务内立即产生,迫使它被实际取消。任务将通过此例外完成它的执行。

另一方面,如果您的某些任务静静地结束了异常(如果您没有检查任务执行的结果),asyncio会发出警告。

为了避免出现问题,你应该等待任务取消接收asyncio.CancelledError(也许抑制,因为你不需要它,然后):

import asyncio 
from contextlib import suppress 


async def coro(): 
    # ... 

def main(): 
    loop = asyncio.get_event_loop() 
    worker = asyncio.ensure_future(coro()) 
    try: 
     loop.run_until_complete(worker) 
    except KeyboardInterrupt: 
     print('keyboard interrupt') 

     worker.cancel() 
     with suppress(asyncio.CancelledError): 
      loop.run_until_complete(worker) # await task cancellation. 
    finally: 
     loop.close() 

if __name__ == '__main__': 
    main() 
相关问题