检查元素有子女或没有

我找回这样一个XML文档：检查元素有子女或没有

import xml.etree.ElementTree as ET 

root = ET.parse(urllib2.urlopen(url)) 
for child in root.findall("item"): 
    a1 = child[0].text # ok 
    a2 = child[1].text # ok 
    a3 = child[2].text # ok 
    a4 = child[3].text # BOOM 
    # ...

的XML看起来是这样的：

<item> 
    <a1>value1</a1> 
    <a2>value2</a2> 
    <a3>value3</a3> 
    <a4> 
    <a11>value222</a11> 
    <a22>value22</a22> 
    </a4> 
</item>

如何检查是否a4（在这种特殊情况下，但它可能是其他任何元素）有孩子吗？

来源

2014-09-20 アレックス

你可以尝试的元素有关的list功能：

>>> xml = """<item> 
    <a1>value1</a1> 
    <a2>value2</a2> 
    <a3>value3</a3> 
    <a4> 
    <a11>value222</a11> 
    <a22>value22</a22> 
    </a4> 
</item>""" 
>>> root = ET.fromstring(xml) 
>>> list(root[0]) 
[] 
>>> list(root[3]) 
[<Element 'a11' at 0x2321e10>, <Element 'a22' at 0x2321e48>] 
>>> len(list(root[3])) 
2 
>>> print "has children" if len(list(root[3])) else "no child" 
has children 
>>> print "has children" if len(list(root[2])) else "no child" 
no child 
>>> # Or simpler, without a call to list within len, it also works: 
>>> print "has children" if len(root[3]) else "no child" 
has children

我修改了你的示例，因为item根上的findall函数调用不起作用（因为findall将搜索直接后代，而不是当前元素）。如果您要访问之后你的工作程序subchildren的文字，你可以这样做：

for child in root.findall("item"): 
    # if there are children, get their text content as well. 
    if len(child): 
    for subchild in child: 
     subchild.text 
    # else just get the current child text. 
    else: 
    child.text

这将是一个非常适合的递归虽然。

来源

2014-09-20 16:14:18 jlr

不起作用。你能用我的例子迭代吗？ – 2014-09-20 16:28:46

它不起作用，因为你的迭代循环没有产生任何元素，因为没有元素名为'item' – marscher 2014-09-20 16:36:01

是的，它在我的真实应用程序中产生它们。 – 2014-09-20 16:43:49

元素类具有get儿童方法。所以，你应该使用这样的事情，要检查是否有孩子，结果存储在字典中的键=标签名称：

result = {} 
for child in root.findall("item"): 
    is child.getchildren() == []: 
     result[child.tag] = child.text

来源

2014-09-20 16:14:02 marscher

'getchildren'自2.7版以来已弃用。 [从文档]（https://docs.python.org/2/library/xml.etree.elementtree.html）：使用列表（elem）或迭代。 – jlr 2014-09-20 16:15:14

你是对的。它不应该再使用 – marscher 2014-09-20 16:16:18

我个人建议您使用完全支持xpath表达式的xml解析器。 subset supported by xml.etree不适合这样的任务。

例如，在lxml我可以这样做：

“给我<item>节点的孩子的所有孩子”：

doc.xpath('//item/*/child::*') #equivalent to '//item/*/*', if you're being terse 
Out[18]: [<Element a11 at 0x7f60ec1c1348>, <Element a22 at 0x7f60ec1c1888>]

，或者

“给我所有的<item>自己没有孩子的孩子“：

doc.xpath('/item/*[count(child::*) = 0]') 
Out[20]: 
[<Element a1 at 0x7f60ec1c1588>, 
<Element a2 at 0x7f60ec1c15c8>, 
<Element a3 at 0x7f60ec1c1608>]

或

“给我所有没有任何儿童的元素”：

doc.xpath('//*[count(child::*) = 0]') 
Out[29]: 
[<Element a1 at 0x7f60ec1c1588>, 
<Element a2 at 0x7f60ec1c15c8>, 
<Element a3 at 0x7f60ec1c1608>, 
<Element a11 at 0x7f60ec1c1348>, 
<Element a22 at 0x7f60ec1c1888>] 

# and if I only care about the text from those nodes... 
doc.xpath('//*[count(child::*) = 0]/text()') 
Out[30]: ['value1', 'value2', 'value3', 'value222', 'value22']

来源

2014-09-20 16:17:43 roippi

建议lxml假定存在性能问题，缺少xpath功能。这绝对比ElementTree好，但如果后者没有问题，我不会这么做，特别是考虑到lxml需要安装，并且它在公园并不总是一个很好的散步。 – jlr 2014-09-20 17:47:56

性能是一件事，是的，但完整的xpath支持意味着您可以在一个紧凑的地方完成所有选择节点的工作。 xpath查询需要几秒钟的时间才能完成;编写python代码来遍历树并选择我想要的节点需要更长的时间，并且更有可能产生错误。除了表演，还有很多好处。 – roippi 2014-09-20 17:56:18

我已经能够找到的最简单的方法是直接使用元素的bool值。这意味着你可以在条件语句中使用a4原样：

a4 = Element('a4') 
if a4: 
    print('Has kids') 
else: 
    print('No kids yet') 

a4.append(Element('x')) 
if a4: 
    print('Has kids now') 
else: 
    print('Still no kids')

运行这段代码将打印

No kids yet 
Has kids now

元素的布尔值并没有说任何有关text，tail或属性。它只是表明是否存在儿童，这是原始问题所要求的。

来源

2016-07-22 18:13:46

检查元素有子女或没有

回答

相关问题