BeautifulSoup类HTTPResponse有没有属性编码

我试图让beautifulsoup一个URL工作，如下所示：BeautifulSoup类HTTPResponse有没有属性编码

from urllib.request import urlopen 
from bs4 import BeautifulSoup 
html = urlopen("http://proxies.org") 
soup = BeautifulSoup(html.encode("utf-8"), "html.parser") 
print(soup.find_all('a'))

但是，我得到一个错误：

File "c:\Python3\ProxyList.py", line 3, in <module> 
    html = urlopen("http://proxies.org").encode("utf-8") 
AttributeError: 'HTTPResponse' object has no attribute 'encode'

知道为什么？难道是用urlopen函数做的吗？为什么需要utf-8？

有明确似乎与Python 3和BeautifulSoup4一定的差异，对于那些给出的例子（这似乎是过时或错误的现在）...

来源

2017-01-29 Ke.

该结束了，这是解决方案需要 - http://stackoverflow.com/questions/32382686/unicodeencodeerror-charmap-codec-cant-encode-character-u2010-character-m –

它不工作，因为urlopen返回一个类HTTPResponse对象，你把它当作直接的HTML。您需要在链的响应.read()方法以获得HTML：

response = urlopen("http://proxies.org") 
html = response.read() 
soup = BeautifulSoup(html.decode("utf-8"), "html.parser") 
print (soup.find_all('a'))

你可能也想用html.decode("utf-8")而非html.encode("utf-8")。

来源

2017-01-29 20:29:56

嗨乔希，这仍然不适合我，我使用完全相同的代码作为你和它给我一个“字符映射到”的错误 –

选中此项。

soup = BeautifulSoup(html.read().encode('utf-8'),"html.parser")

来源

2017-01-29 20:41:18 orvi

from urllib.request import urlopen 
from bs4 import BeautifulSoup 
html = urlopen("http://proxies.org") 
soup = BeautifulSoup(html, "html.parser") 
print(soup.find_all('a'))

首先，urlopen将返回一个类似文件的对象
BeautifulSoup可以接受类似文件的对象，它会自动解码，你不应该担心。

Document：

解析文档，它传递到BeautifulSoup构造。 你可以在一个字符串或一个开放的文件句柄传：

from bs4 import BeautifulSoup 

soup = BeautifulSoup(open("index.html")) 

soup = BeautifulSoup("<html>data</html>")

首先，将文档转换为Unicode和HTML实体转换为Unicode字符

来源

2017-01-30 05:58:11

BeautifulSoup类HTTPResponse有没有属性编码

回答

相关问题