2013-02-14 89 views
11

我想合并多个XML文件一起使用Python和没有外部库。 XML文件具有嵌套元素。合并xml文件与嵌套元素没有外部库

示例文件1:

<root> 
    <element1>textA</element1> 
    <elements> 
    <nested1>text now</nested1> 
    </elements> 
</root> 

示例文件2:

​​

我想:

<root> 
    <element1>textA</element1>  
    <element2>textB</element2> 
    <elements> 
    <nested1>text after</nested1> 
    <nested2>new text</nested2> 
    </elements> 
</root> 

我试过的东西:

this answer

from xml.etree import ElementTree as et 
def combine_xml(files): 
    first = None 
    for filename in files: 
     data = et.parse(filename).getroot() 
     if first is None: 
      first = data 
     else: 
      first.extend(data) 
    if first is not None: 
     return et.tostring(first) 

我会得到什么:

<root> 
    <element1>textA</element1> 
    <elements> 
    <nested1>text now</nested1> 
    </elements> 
    <element2>textB</element2> 
    <elements> 
    <nested1>text after</nested1> 
    <nested2>new text</nested2> 
    </elements> 
</root> 

我希望你能看到并理解我的问题。我正在寻找一个适当的解决方案,任何指导都会很棒。

为了澄清问题,使用我现有的解决方案,嵌套元素不合并。

回答

18

您发布的代码是将所有元素组合在一起,而不管具有相同标签的元素是否已存在。因此,您需要迭代元素并手动检查并按照您认为合适的方式进行组合,因为它不是处理XML文件的标准方式。我不能解释它比代码更好,所以在这里,它或多或少地被评论:

from xml.etree import ElementTree as et 

class XMLCombiner(object): 
    def __init__(self, filenames): 
     assert len(filenames) > 0, 'No filenames!' 
     # save all the roots, in order, to be processed later 
     self.roots = [et.parse(f).getroot() for f in filenames] 

    def combine(self): 
     for r in self.roots[1:]: 
      # combine each element with the first one, and update that 
      self.combine_element(self.roots[0], r) 
     # return the string representation 
     return et.tostring(self.roots[0]) 

    def combine_element(self, one, other): 
     """ 
     This function recursively updates either the text or the children 
     of an element if another element is found in `one`, or adds it 
     from `other` if not found. 
     """ 
     # Create a mapping from tag name to element, as that's what we are fltering with 
     mapping = {el.tag: el for el in one} 
     for el in other: 
      if len(el) == 0: 
       # Not nested 
       try: 
        # Update the text 
        mapping[el.tag].text = el.text 
       except KeyError: 
        # An element with this name is not in the mapping 
        mapping[el.tag] = el 
        # Add it 
        one.append(el) 
      else: 
       try: 
        # Recursively process the element, and update it in the same way 
        self.combine_element(mapping[el.tag], el) 
       except KeyError: 
        # Not in the mapping 
        mapping[el.tag] = el 
        # Just add it 
        one.append(el) 

if __name__ == '__main__': 
    r = XMLCombiner(('sample1.xml', 'sample2.xml')).combine() 
    print '-'*20 
    print r 
+0

完美的工作,谢谢,我刚开始写我自己的代码。 :) – 2013-02-14 16:32:14

+0

很好,谢谢。我们还需要合并属性。可以通过在替换元素文本后在'combine_element'和'mapping [el.tag] .attrib.update(el.attrib)'开始处添加'one.attrib.update(other.attrib)'来完成。 – 2013-11-04 18:38:55

+0

哦,对了,我忘记了属性。接得好。 – jadkik94 2013-11-06 20:09:28

2

谢谢,但我的问题是通过考虑属性也合并。这里是我的补丁后的代码:

import sys 
    from xml.etree import ElementTree as et 


    class hashabledict(dict): 
     def __hash__(self): 
      return hash(tuple(sorted(self.items()))) 


    class XMLCombiner(object): 
     def __init__(self, filenames): 
      assert len(filenames) > 0, 'No filenames!' 
      # save all the roots, in order, to be processed later 
      self.roots = [et.parse(f).getroot() for f in filenames] 

    def combine(self): 
     for r in self.roots[1:]: 
      # combine each element with the first one, and update that 
      self.combine_element(self.roots[0], r) 
     # return the string representation 
     return et.ElementTree(self.roots[0]) 

    def combine_element(self, one, other): 
     """ 
     This function recursively updates either the text or the children 
     of an element if another element is found in `one`, or adds it 
     from `other` if not found. 
     """ 
     # Create a mapping from tag name to element, as that's what we are fltering with 
     mapping = {(el.tag, hashabledict(el.attrib)): el for el in one} 
     for el in other: 
      if len(el) == 0: 
       # Not nested 
       try: 
        # Update the text 
        mapping[(el.tag, hashabledict(el.attrib))].text = el.text 
       except KeyError: 
        # An element with this name is not in the mapping 
        mapping[(el.tag, hashabledict(el.attrib))] = el 
        # Add it 
        one.append(el) 
      else: 
       try: 
        # Recursively process the element, and update it in the same way 
        self.combine_element(mapping[(el.tag, hashabledict(el.attrib))], el) 
       except KeyError: 
        # Not in the mapping 
        mapping[(el.tag, hashabledict(el.attrib))] = el 
        # Just add it 
        one.append(el) 

if __name__ == '__main__': 

    r = XMLCombiner(sys.argv[1:-1]).combine() 
    print '-'*20 
    print et.tostring(r.getroot()) 
    r.write(sys.argv[-1], encoding="iso-8859-1", xml_declaration=True)