2013-03-16 53 views
3

对于使用urllib下面几行:Python的响应解码

# some request object exists 
response = urllib.request.urlopen(request) 
html = response.read().decode("utf8") 

什么格式字符串的呢read()回报?我一直试图从Python的文档中找出它,但它根本没有提及它。为什么有decode?是否decode对象 UTF-8进行解码,以 UTF-8或?从什么格式到解码到什么格式? decode文档也没有提到这一点。是Python的文档是可怕的,还是我不明白一些标准的约定?

我想存储在UTF-8的文件,HTML。我只是做一个正常的写作,还是我需要“编码”回到某些东西并写下来?

注:我知道的urllib已过时,但我不能切换现在

+0

感谢向下票没有评论...? – darksky 2013-03-16 20:33:51

+3

[如何停止的痛苦?(http://www.youtube.com/watch?v=sgHbC6udIqc) – root 2013-03-16 20:35:14

+0

真棒,谢谢@root! – darksky 2013-03-16 20:38:11

回答

0

到的urllib2问蟒蛇:

>>> r=urllib.urlopen("http://google.com") 
>>> a=r.read() 
>>> type(a) 
0: <type 'str'> 
>>> help(a.decode) 
Help on built-in function decode: 

decode(...) 
    S.decode([encoding[,errors]]) -> object 

    Decodes S using the codec registered for encoding. encoding defaults 
    to the default encoding. errors may be given to set a different error 
    handling scheme. Default is 'strict' meaning that encoding errors raise 
    a UnicodeDecodeError. Other possible values are 'ignore' and 'replace' 
    as well as any other name registered with codecs.register_error that is 
    able to handle UnicodeDecodeErrors. 

>>> b = a.decode('utf8') 
>>> type(b) 
1: <type 'unicode'> 
>>> 

所以,看来read()返回str.decode()从 UTF-8 Python的内部Unicode格式解码

+0

出于某种原因,我所使用的'decode()'doc页面是不同的。谢谢 – darksky 2013-03-16 20:39:52

+0

所以'str'不支持所有unicode字符,因此'read()'后面的'decode()'链接? – darksky 2013-03-16 20:41:49