0
我编码XML与 “GBK”:python xml fromstring无法解码gbk编码?
#!/usr/bin/env python
# encoding: utf-8
from xml.etree.ElementTree import Element, SubElement, tostring, fromstring, XML, XMLParser
root = Element('root')
child = SubElement(root, "child")
child.text = u"中文"
result = tostring(root, encoding="gbk")
print(result)
print(result.decode("gbk"))
这将产生的结果是这样的:
b"<?xml version='1.0' encoding='gbk'?>\n<root><child>\xd6\xd0\xce\xc4</child></root>"
所以,我试图解析XML这样的,我不喜欢这样写道:
tree = XML(result.decode("gbk"))
print(tree[0].text)
tree = XML(result.decode("gbk"), parser=XMLParser(encoding="gbk"))
print(tree[0].text)
tree = XML(result.decode("gbk"), parser=XMLParser(encoding="utf-8"))
print(tree[0].text)
我发现他们都在python 3.6
工作,但他们都不在python 2.7
工作,在python 2.7
错误是:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 50-51: ordinal not in range(128)
所以,我有两个问题:?
- 为什么都
XMLParser(encoding="gbk")
XMLParser(encoding="utf-8")
返回相同的结果python3.6
- 如何使XML解析器
python2.7
工作正常(我不认为result.decode('gbk').encode('utf8').replace('GBK', 'utf-8')
是个好主意。)
在你的脚本上设置编码? –
@cᴏʟᴅsᴘᴇᴇᴅ我不明白 – roger
将此设置为脚本的第一行'# - * - coding:utf-8 - * - ' –