2017-08-08 239 views
1

我试图以异步的方式从一组URL中提取数据。我想每10秒(或多或少)通过一组URL来执行请求。Python异步循环concurrent.futures.ThreadPoolExecutor

import aiohttp 
import asyncio 

from aiohttp import ClientSession 


def create_list_urls(): 
    list_urls = [["http://apiexample.com/param1", "http://apiexample2.com/param1"], 
       ["http://apiexample.com/param2", "http://apiexample2.com/param2"]] 
    return list_urls 

async def retrieve_datas(url, session): 
    async with session.get(url) as response: 
     return await response.json() 


async def main(): 
    while True: 
     urls_to_crawl = create_list_urls() 
     for urls in urls_to_crawl: 
      tasks = [] 
      async with ClientSession() as session: 
       for url in urls: 
        tasks.append(asyncio.ensure_future(
         retrieve_datas(url, session))) 
       datas_extracted = await asyncio.gather(*tasks, return_exceptions=False) 
       print(datas_extracted) 
     asyncio.sleep(10) 

if __name__ == '__main__': 
    loop = asyncio.get_event_loop() 
    future = asyncio.ensure_future(main()) 
    loop.run_until_complete(future) 

但我收到此错误:

Traceback (most recent call last): 
    File "test.py", line 34, in <module> 
    loop.run_until_complete(future) 
    File "/usr/lib/python3.5/asyncio/base_events.py", line 466, in run_until_complete 
    return future.result() 
    File "/usr/lib/python3.5/asyncio/futures.py", line 293, in result 
    raise self._exception 
    File "/usr/lib/python3.5/asyncio/tasks.py", line 241, in _step 
    result = coro.throw(exc) 
    File "test.py", line 27, in main 
    datas_extracted = await asyncio.gather(*tasks, return_exceptions=False) 
    File "/usr/lib/python3.5/asyncio/futures.py", line 380, in __iter__ 
    yield self # This tells Task to wait for completion. 
    File "/usr/lib/python3.5/asyncio/tasks.py", line 304, in _wakeup 
    future.result() 
    File "/usr/lib/python3.5/asyncio/futures.py", line 293, in result 
    raise self._exception 
    File "/usr/lib/python3.5/asyncio/tasks.py", line 239, in _step 
    result = coro.send(None) 
    File "test.py", line 14, in retrieve_datas 
    async with session.get(url) as response: 
    File "/usr/local/lib/python3.5/dist-packages/aiohttp/client.py", line 603, in __aenter__ 
    self._resp = yield from self._coro 
    File "/usr/local/lib/python3.5/dist-packages/aiohttp/client.py", line 231, in _request 
    conn = yield from self._connector.connect(req) 
    File "/usr/local/lib/python3.5/dist-packages/aiohttp/connector.py", line 378, in connect 
    proto = yield from self._create_connection(req) 
    File "/usr/local/lib/python3.5/dist-packages/aiohttp/connector.py", line 687, in _create_connection 
    _, proto = yield from self._create_direct_connection(req) 
    File "/usr/local/lib/python3.5/dist-packages/aiohttp/connector.py", line 698, in _create_direct_connection 
    hosts = yield from self._resolve_host(req.url.raw_host, req.port) 
    File "/usr/local/lib/python3.5/dist-packages/aiohttp/connector.py", line 669, in _resolve_host 
    self._resolver.resolve(host, port, family=self._family) 
    File "/usr/local/lib/python3.5/dist-packages/aiohttp/resolver.py", line 31, in resolve 
    host, port, type=socket.SOCK_STREAM, family=family) 
    File "/usr/lib/python3.5/asyncio/base_events.py", line 673, in getaddrinfo 
    host, port, family, type, proto, flags) 
    File "/usr/lib/python3.5/asyncio/base_events.py", line 634, in run_in_executor 
    executor = concurrent.futures.ThreadPoolExecutor() 
TypeError: __init__() missing 1 required positional argument: 'max_workers' 

所以我的问题是,如何解决它,但更多的,我想我不这样做的正确的方式异步。奇怪的问题是,如果我使用我的IDE进行粗略迭代(一步一步调试),则可以在错误引发之前执行一次迭代(接收第一个URL组的数据),但如果直接执行此代码,异常会触发即刻。

编辑:

如果我使用python 3.6例外在前看不见......该代码工作除了asyncio.sleep(10)不执行(???)我的代码永远不会睡觉。如果我用time.sleep(10)替换asyncio.sleep(10),它就可以工作。我想我错过了一些东西。我的问题已解决,但如果有人能解释我为什么会出现这种睡眠行为,并且全球性的,如果我的代码是正确的做异步请求。

+0

您使用的是什么版本的'aiohttp'? – nlsdfnbch

+0

我的版本:2.2.5 – Matt

回答

0

错误不是由aiohttp引起的,而是由asyncio引发的,它非常奇怪,因为代码被覆盖了。

你用什么python版本?它是自定义构建?

关于asyncio.sleep() - 在拨打电话前放await

+0

arg ...我忘了等待:(谢谢Andrew。对于我的Python版本:Python 3.5.3和代码与Python 3.6.2 – Matt

+0

在3.5.3上max_workers '参数不是必需的: 该参数在Python 3.4中是必需的,但对于3.5+ –