我想用Python Etree解析器解析和比较2个XML文件,如下所示:Python用Etree替换XML内容

我有2个带有数据加载的XML文件。一个是英文(源文件),另一个是相应的法文翻译(目标文件)。 如:



     some more tags and info on same level 
      some more strings and entries 



some more tags and info on same level 

法国的目标文件有一个空的跨语言只要2个宏具有相同的ID,我想从英文源文件中输入信息。 我已经编写了一些代码,其中我用一个唯一的标记名称替换了字符串标记名称,以便识别跨语言引用。现在我想比较两个文件,如果两个宏具有相同的ID,则将法文文件中的空引用与英文文件中的信息进行交换。我之前尝试过minidom解析器,但卡住了,现在想试试Etree。我几乎没有任何关于编程的知识,并且很难找到它。 这里是我到目前为止的代码:

macros = ElementTree.parse(english) 

    for tag in macros.getchildren('macro'): 
     id_ = tag.find('id') 
     data = tag.find('cl') 
     id_dict[id_.text] = data.text 

    macros = ElementTree.parse(french) 

    for tag in macros.getchildren('macro'): 
     id_ = tag.find('id') 
     target = tag.find('cl') 
     if target.text.strip() == '': 
     target.text = id_dict[id_.text] 

    print (ElementTree.tostring(macros)) 



最好附加更复杂的样本以帮助解决方案更正确。 – pepr 2012-07-17 08:04:13




import xml.etree.ElementTree as etree 

english_tree = etree.parse('en.xml') 
french_tree = etree.parse('fr.xml') 

# Get the root elements, as they support iteration 
# through their children (direct descendants) 
english_root = english_tree.getroot() 
french_root = french_tree.getroot() 

# Iterate through the direct descendants of the root 
# elements in both trees in parallel. 
for en, fr in zip(english_root, french_root): 
    assert en.tag == fr.tag # check for the same structure 
    if en.tag == 'id': 
     assert en.text == fr.text # check for the same id 

    elif en.tag == 'string': 
     if fr.text is None: 
      fr.text = en.text 
      print en.text  # displaying what was replaced 



import xml.etree.ElementTree as etree 

english_tree = etree.parse('en.xml') 
french_tree = etree.parse('fr.xml') 

for en, fr in zip(english_tree.iter(), french_tree.iter()): 
    assert en.tag == fr.tag  # check if the structure is the same 
    if en.tag == 'id': 
     assert en.text == fr.text # identification must be the same 
    elif en.tag == 'string': 
     if fr.text is None: 
      fr.text = en.text 
      print en.text   # display the inserted text 

# Write the result to the output file. 
with open('fr2.xml', 'w') as fout: 


import xml.etree.ElementTree as etree 

def find_translation(tree, id_): 
    # Search fot the GH element with the given identification, and return 
    # its translation if found. Otherwise None is returned implicitly. 
    for gh in tree.iter('GH'): 
     id_elem = gh.find('./id') 
     if id_ == id_elem.text: 
      # The related GH element found. 
      # Find metadata entry, extract the translation. 
      # Warning! This is simplification for the fixed position 
      # of the Translation entry. 
      me = gh.find('./metadata/entry') 
      assert len(me) == 2  # metadata/entry has two elements 
      cl1 = me[0] 
      assert cl1.text == 'Translation' 
      cl2 = me[1] 

      return cl2.text 

# Body of the program. -------------------------------------------------- 

english_tree = etree.parse('en.xml') 
french_tree = etree.parse('fr.xml') 

for gh in french_tree.iter('GH'): # iterate through the GH elements only 
    # Get the identification of the GH section 
    id_elem = gh.find('./id')  
    id_ = id_elem.text 

    # Find and check the metadata entry, extract the French translation. 
    # Warning! This is simplification for the fixed position of the Translation 
    # entry. 
    me = gh.find('./metadata/entry') 
    assert len(me) == 2  # metadata/entry has two elements 
    cl1 = me[0] 
    assert cl1.text == 'Translation' 
    cl2 = me[1] 
    fr_translation = cl2.text 

    # If the French translation is empty, put there the English translation 
    # from the related element. 
    if cl2.text is None: 
     cl2.text = find_translation(english_tree, id_) 

with open('fr2.xml', 'w') as fout: 

现在是XPath的时候了(标准'xml.etree.ElementTree'只支持它的一些特性,但它们对于这种情况足够强大)。尝试修改后的答案(最后一部分)。修复输入/输出文件的名称。然后,我建议在这里清理注释,以使其更易于阅读和有用。 – pepr 2012-07-17 14:26:12


正确....如果翻译条目不固定,我可以将翻译周围的“条目”标签重命名为独特的东西,并以此方式找到它,或者不建议这样做(因为我尝试了这种方法,但它不起作用,但我想知道这是不是正确的方向?) – Kaly 2012-07-17 15:18:41


标记重命名可能不应该在一般情况下完成。如果标签/元素具有其自己的特殊名称会更好。这种方式''不是一个好例子。但我明白,用户可能会决定以交互方式插入该列,而底层软件无法猜测用户想要的内容。 – pepr 2012-07-17 16:04:53