2016-08-18 135 views
2

我正尝试使用proxybroker为某些国家/地区生成具有活动代理的文件。我总是遇到同样的错误,试图获取代理。该错误似乎是proxbroker使用的packe中的编码/解码错误。但我怀疑可能有更好的方式来使用代理经纪人。在python3.5中使用proxybroker会引发编码错误

这是引起问题的代码:

def gather_proxies(countries): 
""" 
This method uses the proxybroker package to asynchronously get two new proxies per specified country 
and returns the proxies as a list of country and proxy. 

:param countries: The ISO style country codes to fetch proxies for. Countries is a list of two letter strings. 
:return: A list of proxies that are themself a list with two paramters[Location, proxy address]. 
""" 
proxy_list = [] 
types = ['HTTP'] 
for country in countries: 
    loop = asyncio.get_event_loop() 

    proxies = asyncio.Queue(loop=loop) 
    broker = Broker(proxies, loop=loop,) 

    loop.run_until_complete(broker.find(limit=2, countries=country, types=types)) 

    while True: 
     proxy = proxies.get_nowait() 
     if proxy is None: 
      break 
     print(str(proxy)) 
     proxy_list.append([country, proxy.host + ":" + str(proxy.port)]) 
return proxy_list 

和错误消息:

../app/main/download_thread.py:344: in update_proxies 
proxy_list = gather_proxies(country_list) 
../app/main/download_thread.py:368: in gather_proxies 
    loop.run_until_complete(broker.find(limit=2, countries=country, types=types)) 
/usr/lib/python3.5/asyncio/base_events.py:387: in run_until_complete 
    return future.result() 
/usr/lib/python3.5/asyncio/futures.py:274: in result 
    raise self._exception 
/usr/lib/python3.5/asyncio/tasks.py:241: in _step 
    result = coro.throw(exc) 
../venv/lib/python3.5/site-packages/proxybroker/api.py:108: in find 
    await self._run(self._checker.check_judges(), action) 
../venv/lib/python3.5/site-packages/proxybroker/api.py:114: in _run 
    await tasks 
/usr/lib/python3.5/asyncio/futures.py:361: in __iter__ 
    yield self # This tells Task to wait for completion. 
/usr/lib/python3.5/asyncio/tasks.py:296: in _wakeup 
    future.result() 
/usr/lib/python3.5/asyncio/futures.py:274: in result 
    raise self._exception 
/usr/lib/python3.5/asyncio/tasks.py:241: in _step 
    result = coro.throw(exc) 
../venv/lib/python3.5/site-packages/proxybroker/checker.py:26: in check_judges 
    await asyncio.gather(*[j.check() for j in self._judges]) 
/usr/lib/python3.5/asyncio/futures.py:361: in __iter__ 
    yield self # This tells Task to wait for completion. 
/usr/lib/python3.5/asyncio/tasks.py:296: in _wakeup 
    future.result() 
/usr/lib/python3.5/asyncio/futures.py:274: in result 
    raise self._exception 
/usr/lib/python3.5/asyncio/tasks.py:239: in _step 
    result = coro.send(None) 
../venv/lib/python3.5/site-packages/proxybroker/judge.py:62: in check 
    page = await resp.text() 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <ClientResponse(http://ip.spys.ru/) [200 OK]> 
<CIMultiDictProxy('Date': 'Thu, 18 Aug 2016 11:02:53 GMT', 'Server': 'Ap...': 'no-cache', 'Vary': 'Accept-Encoding', 'Transfer-Encoding': 'chunked', 'Content-Type': 'text/html; charset=UTF-8')> 

encoding = 'utf-8' 

    @asyncio.coroutine 
    def text(self, encoding=None): 
     """Read response payload and decode.""" 
     if self._content is None: 
      yield from self.read() 

     if encoding is None: 
      encoding = self._get_encoding() 

>  return self._content.decode(encoding) 
E  UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd6 in position 5568: invalid continuation byte 

../venv/lib/python3.5/site-packages/aiohttp/client_reqrep.py:758: UnicodeDecodeError 

这个问题似乎是proxybroker或者更确切地说,aiohttp封装内。但因为它被认为是一个测试包,所以问题可能是我的代码。

任何人都可以看到我做错了什么或有没有人有关代理经纪人的使用建议?

回答

1

问题出在resp.text()打电话。 它以文本的形式检索html页面。 aiohttp尝试使用chardet库来确定正确的编码,但对于格式不正确的页面,这是不可能的。

我认为resp.text()应替换为resp.read()用于提取页面为bytes而不解码为str

+0

谢谢,我在proxybroker中提交了一个问题!这似乎是问题所在。 – SSchneid

+1

如果我将resp.text()更改为resp.read()并获取字节对象而不是字符串,则必须将其转换为字符串。 但是,该转换将始终引发解码错误,因为响应中有一个字节无法读取,对吧? – SSchneid

+1

对于broxybroker而言,只需'latin1'就足够了。 它永远不会失败。 –

相关问题