2016-03-15 160 views
3

我想基于属性值对文档中的某些子元素进行排序,而实际的排序函数似乎正在工作,新排序元素的拼接不会'似乎是。使用基于属性值的lxml对子元素排序

from lxml import etree 

def getkey(elem): 
    # Used for sorting elements by @LIN. 
    # returns a tuple of ints from the exploded @LIN value 
    # '1.0' -> (1,0) 
    # '1.0.1' -> (1,0,1) 
    return tuple([int(x) for x in elem.get('LIN').split('.')]) 

xml_str = """<Interface> 
       <Header></Header> 
       <PurchaseOrder> 
        <LineItems> 
         <Line LIN="2.0"></Line> 
         <Line LIN="3.0"></Line> 
         <Line LIN="1.0"></Line> 
        </LineItems> 
       </PurchaseOrder> 
      </Interface>""" 

root = etree.fromstring(xml_str) 
lines = root.findall("PurchaseOrder/LineItems/Line") 
lines[:] = sorted(lines, key=getkey) 
res_lines = [x.get('LIN') for x in lines] 
print res_lines 

print etree.tostring(root, pretty_print=True) 

当我执行上面的代码,我会看到lines名单是正确的排序,因为它打印['1.0', '2.0', '3.0']。但是,由于tostring()打印出下面的内容,所以XML树不会更新。

<Interface> 
    <Header/> 
    <PurchaseOrder> 
    <LineItems> 
     <Line LIN="2.0"/> 
     <Line LIN="3.0"/> 
     <Line LIN="1.0"/> 
    </LineItems> 
    </PurchaseOrder> 
</Interface> 

我得到了如何从http://effbot.org/zone/element-sort.htm排序,它说,拼接应该是所有我需要更新的元素顺序的想法,但好好尝试一下似乎是这样。我意识到lxml不是100%与元素树兼容,所以作为一个理智检查,我用elementtree替换了lxml导入,并得到了完全相同的结果。

回答

6

这将排序和写输出:

import xml.etree.ElementTree as ET 

tree = ET.parse("in.xml") 

def getkey(elem): 
    # Used for sorting elements by @LIN. 
    # returns a tuple of ints from the exploded @LIN value 
    # '1.0' -> (1,0) 
    # '1.0.1' -> (1,0,1) 
    return float(elem.get('LIN')) 

container = tree.find("PurchaseOrder/LineItems") 

container[:] = sorted(container, key=getkey) 

tree.write("new.xml") 

或者使用自己的代码来打印:

import xml.etree.ElementTree as ET 

tree = ET.fromstring(xml_str) 

def getkey(elem): 
    # Used for sorting elements by @LIN. 
    # returns a tuple of ints from the exploded @LIN value 
    # '1.0' -> (1,0) 
    # '1.0.1' -> (1,0,1) 
    return float(elem.get('LIN')) 

root = etree.fromstring(xml_str) 
lines = root.find("PurchaseOrder/LineItems") 
lines[:] = sorted(lines, key=getkey) 

输出:

In [12]: print (etree.tostring(root, pretty_print=True)) 
     <Interface> 
      <Header/> 
       <PurchaseOrder> 
        <LineItems> 
         <Line LIN="1.0"/> 
        <Line LIN="2.0"/> 
         <Line LIN="3.0"/> 
         </LineItems> 
       </PurchaseOrder> 
      </Interface> 

关键是root.find("PurchaseOrder/LineItems"),你想找到LineItems元素并对其进行排序。

+1

啊,当然是。我当时天真地认为,因为它是一个神奇的参考清单,它会调整顺序,但是我现在看到这个主意多么愚蠢。谢谢。 –