在python中访问非树形结构的xml数据

我有几个我想在python中解析的xml文件。我知道python中的ElementTree包，但是我的xml文件没有像结构一样存储在树中。下面是一个例子在python中访问非树形结构的xml数据

<tag1 attribute1="at1" attribute2="at2">My files are text that I annotated with a tool 
to create these xml files.</tag1> 
Some parts of the text are enclosed in an xml tag, whereas others are not. 
<tag1 attribute1="at1" attribute2="at2"><tag2 attribute3="at3" attribute4="at4">Some 
are even enclosed in multiple tags.</tag1></tag2> 
And some have overlapping tags: 
<tag1 attribute1="at1" attribute2="at2">This is an example sentence 
<tag3 attribute5="at5">containing a nested example sentence</tag3></tag1>

每当我使用的ElementTree类的函数解析文件，我只能访问的第一个标签。我正在寻找一种解析所有标签的方法，并且不需要像结构树这样的树。任何帮助是极大的赞赏。

来源

2017-04-14 imc

如果您的示例是正确的，那是无效的XML。在第二种情况下，打开tag1，打开tag2，关闭tag1！有些库尝试猜测格式不正确的XML，但请首先确认您的示例是正确的。 – Javier

另外，发布你如何尝试当前访问元素。 – Javier

按照定义，XML是格式良好的。这个标记不能用在像etree这样的兼容XML库中。现在，如果这一切都包装在你没有发布的根标签中，那么它可能是有效的。 – Parfait

如果每行只有一个XML片段，则只需分别解析每行。

for line in some_file: 
    # parse using ET and getroot.

来源

2017-04-14 12:48:12 Javier

在python中访问非树形结构的xml数据

回答

相关问题