在python中写入XML文件损坏文件

我正在尝试将xml.dom.minidom对象中的内容写入文件。简单的想法是使用'writexml'方法：在python中写入XML文件损坏文件

import codecs 

def write_xml_native(): 
    # Building DOM from XML 
    xmldoc = minidom.parse('semio2.xml') 
    f = codecs.open('codified.xml', mode='w', encoding='utf-8') 
    # Using native writexml() method to write 
    xmldoc.writexml(f, encoding="utf=8") 
    f.close()

问题是它破坏了文件中的非拉丁文编码文本。另一种方式是让文本字符串，并将其写入到文件中明确：

def write_xml(): 
    # Building DOM from XML 
    xmldoc = minidom.parse('semio2.xml') 
    # Opening file for writing UTF-8, which is XML's default encoding 
    f = codecs.open('codified3.xml', mode='w', encoding='utf-8') 
    # Writing XML in UTF-8 encoding, as recommended in the documentation 
    f.write(xmldoc.toxml("utf-8")) 
    f.close()

这将导致以下错误：

Traceback (most recent call last): 
    File "D:\Projects\Semio\semioparser.py", line 45, in <module> 
    write_xml() 
    File "D:\Projects\Semio\semioparser.py", line 42, in write_xml 
    f.write(xmldoc.toxml(encoding="utf-8")) 
    File "C:\Python26\lib\codecs.py", line 686, in write 
    return self.writer.write(data) 
    File "C:\Python26\lib\codecs.py", line 351, in write 
    data, consumed = self.encode(object, self.errors) 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 2064: ordinal not in range(128)

如何编写一个XML文本文件？我错过了什么？

编辑。错误通过添加解码语句来解决： f.write(xmldoc.toxml("utf-8").decode("utf-8")) 但是，俄罗斯符号仍然损坏。

在解释器中查看文本时，文本没有被破坏，但是在文件中写入文本时。

来源

2010-12-19 martinthenext

只是一个想法：你确定您没有错误地查看文件？也许读者期待的是另一种编码，而不是utf-8，它看起来像是borked。 – Nubsis 2010-12-29 12:28:11

@Nubsis这正是发生了什么事情。观众一直期待着ASCII编码。我会保持线程，因为使用.decode（）也是问题。谢谢！ – martinthenext 2011-01-09 19:08:08

嗯，虽然这应该工作：

xml = minidom.parse("test.xml") 
with codecs.open("out.xml", "w", "utf-8") as out: 
    xml.writexml(out)

你可以或者尝试：

with codecs.open("test.xml", "r", "utf-8") as inp: 
    xml = minidom.parseString(inp.read().encode("utf-8")) 
with codecs.open("out.xml", "w", "utf-8") as out: 
    xml.writexml(out)

更新：如果您构建XML列的字符串对象，你应该传递之前对其进行编码到minidom解析器，像这样：

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 

import codecs 
import xml.dom.minidom as minidom 

xml = minidom.parseString(u"<ru>Тест</ru>".encode("utf-8")) 
with codecs.open("out.xml", "w", "utf-8") as out: 
    xml.writexml(out)

来源

2010-12-19 18:09:13

感谢您的回答。我测试了你的所有代码，对我来说没有任何问题。即使是最后一部分，与打开XML文件无关，也会将俄语字符串翻译为废话。这意味着问题在于将urf-8写入文件。还有什么想法？ – martinthenext 2010-12-19 19:15:34

@martinthenext：我几乎肯定你会得到有效的“utf-8”（3个例子都适用于我，无论是在windows＆linux还是python 2.5,2.6和2.7上）或者你的python安装被破坏;这里去截图：http：//img190.imageshack.us/img190/9072/minidom.png – 2010-12-19 20:03:19

等等，解释器本身的输出就好，没有问题。写入文件时会损坏。我怎样才能解决这个问题？ – martinthenext 2010-12-19 20:09:31

试试这个：

with open("codified.xml", "w") as f: 
    f.write(xmldoc.toxml("utf-8").decode("utf-8"))

这对我的作品（Python 3的下，虽然）。

来源

2010-12-19 17:48:17

nope，它仍然破坏非拉丁字符 – martinthenext 2010-12-19 17:56:15

如果你'x = codecs.open（“semio2.xml”，encoding =“utf-8”）'''xmldoc = minidom.parse（x）'会发生什么？ – 2010-12-19 18:03:07

它说'UnicodeEncodeError：'ascii'编解码器不能编码字符u'\ ufeff'在位置0：序号不在范围（128）'中。我不明白为什么。 – martinthenext 2010-12-19 19:22:50

在python中写入XML文件损坏文件

回答

相关问题