utf-16字符串解码问题

我正在使用python3.3。我一直试图解码某个字符串，看起来像这样：utf-16字符串解码问题

b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\xed:\xf9w\xdaH\xd2?\xcf\xbc....

继续。然而，每当我尝试使用str.decode('utf-16')此字符串解码，我得到一个错误说：

'utf16' codec can't decode bytes in position 54-55: illegal UTF-16 surrogate

我不完全知道如何此字符串解码。

来源

2016-04-27 Cristian

所以这意味着它不是真正的UTF16。你从哪里弄到弦？可能是UCS2？ – RemcoGerlich

如果只解码到位置53，结果是否正常？这可能有助于决定你的假设'utf16'是否正确。 – mkiever

我从Twisted中得到了它，我在'handleResponsePart（self，buffer）'函数的'twisted/web/proxy.py'中，我只是注入了'print（buffer）'。所以基本上你看到的编码字符串应该是HTML，我从Twisted代理 – Cristian

gzipped data begins with \x1f\x8b\x08所以我的猜测是你的数据是gzipped。在解码之前尝试gunzipping the data。

import io 
import gzip 

# this raises IOError because `buf` is incomplete. It may work if you supply the complete buf 
buf = b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\xed:\xf9w\xdaH\xd2?\xcf\xbc' 
with gzip.GzipFile(fileobj=io.BytesIO(buf)) as f: 
    content = f.read() 
    print(content.decode('utf-16'))

来源

2016-04-27 20:02:08 unutbu

感谢这实际上真的很好！ – Cristian

utf-16字符串解码问题

回答

相关问题