在Python中保留转义字符XML解析

我正在尝试编写一个基于输入文件内容的一个或两个xml文件并输出一个或两个新文件的python脚本。我试图用minidom模块编写这个脚本。但是，输入文件包含了一些转义字符在Python中保留转义字符XML解析

节点属性中的实例。不幸的是，在输出文件中，这些字符已被转换为不同的字符，这似乎是换行符。

例如，在输入文件中，诸如线：

<Entry text="For English For Hearing Impaired&#xa;Press 3 on Keypad"

将被输出作为

<Entry text="For English For Hearing Impaired 
Press 3 on Keypad"

我读minidom被造成这一点，因为它不允许转义字符在xml属性（我认为）。这是真的？而且，如果是这样，用什么最好的工具/方法来将xml文件解析为python文档，操作节点并与其他文档交换，并将文档输出回新文件？

如果有帮助，我还使用'utf-8'编码解析并保存这些文件。我不知道这是否是问题的一部分。感谢任何人的帮助。

-Alex凯泽

来源

2010-10-28 Pyrobug

我还没有发现因为使用lxml Python的标准XML模块。它可以做你想要的一切。例如...

的input.xml：

<?xml version="1.0" encoding='utf-8'?> 
<root> 
    <Button3 yposition="250" fontsize="16" language1="For English For Hearing Impaired&#xa;Press 3 on Keypad" /> 
</root>

和：

>>> from lxml import etree 
>>> with open('input.xml') as f: 
...  root = etree.parse(f) 
... 
>>> buttons = root.xpath('//Button3') 
>>> buttons 
[<Element Button3 at 101071f18>] 
>>> buttons[0] 
<Element Button3 at 101071f18> 
>>> buttons[0].attrib 
{'yposition': '250', 'language1': 'For English For Hearing Impaired\nPress 3 on Keypad', 'fontsize': '16'} 
>>> buttons[0].attrib['foo'] = 'bar' 
>>> s = etree.tostring(root, xml_declaration=True, encoding='utf-8', pretty_print=True) 
>>> print(s) 
<?xml version='1.0' encoding='utf-8'?> 
<root> 
    <Button3 yposition="250" fontsize="16" language1="For English For Hearing Impaired&#10;Press 3 on Keypad" foo="bar"/> 
</root> 
>>> with open('output.xml','w') as f: 
...  f.write(s) 
>>>

来源

2010-10-28 01:46:33 snapshoe


是性格0X0A，或换行的XML实体。解析器正确解析XML并给出所指示的字符。如果您想禁止或以其他方式处理属性中的换行符，那么在解析器给予您之后，您可以随意做任何您喜欢的事情。

来源

2010-10-28 02:51:33

不幸的是，标准xml模块没有关闭转义的选项。所以，对我来说最好的选择是使用方法escape it back从ElementTree所使用的xml本身用于此目的（从sax.utils方法不逃避\n）：

text = ElementTree._escape_attrib(text, 'utf-8')

文字在源XML：

Here is a test message&#10;With newline &amp; ampersand

经过 “解码”

文字： “逃跑回” 后

Here is a test message 
With newline & ampersand

文字：

Here is a test message&#10;With newline &amp; ampersand

来源

2016-05-18 07:30:22 Jimilian

在Python中保留转义字符XML解析

回答

相关问题