从Python文本文件创建xml树

我需要避免在解析文本文件时在xml树中创建双分支。比方说，文本文件如下（行的顺序是随机的）：从Python文本文件创建xml树

BRANCH1：branch11：消息11
BRANCH1：branch12：message12
BRANCH2：branch21：message21
BRANCH2：branch22：message22

所以得到的xml树应该有一个有两个分支的根。这两个分支都有两个子分支。我用它来解析这个文本文件的Python代码如下：

import string 
fh = open ('xmlbasic.txt', 'r') 
allLines = fh.readlines() 
fh.close() 
import xml.etree.ElementTree as ET 
root = ET.Element('root') 

for line in allLines: 
    tempv = line.split(':') 
    branch1 = ET.SubElement(root, tempv[0]) 
    branch2 = ET.SubElement(branch1, tempv[1]) 
    branch2.text = tempv[2] 

tree = ET.ElementTree(root) 
tree.write('xmlbasictree.xml')

这段代码的问题是，在XML树的一个分支与来自文本文件的每一行创建。

任何建议如何避免在xml树中创建另一个分支如果具有此名称的分支已经存在？

来源

2010-09-21 bitman

with open("xmlbasic.txt") as lines_file: 
    lines = lines_file.read() 

import xml.etree.ElementTree as ET 

root = ET.Element('root') 

for line in lines: 
    head, subhead, tail = line.split(":") 

    head_branch = root.find(head) 
    if not head_branch: 
     head_branch = ET.SubElement(root, head) 

    subhead_branch = head_branch.find(subhead) 
    if not subhead_branch: 
     subhead_branch = ET.SubElement(branch1, subhead) 

    subhead_branch.text = tail 

tree = ET.ElementTree(root) 
ET.dump(tree)

的逻辑很简单 - 你已经提到它在你的问题！在创建树之前，您只需检查树中是否已存在树枝。

请注意，这可能是低效的，因为您正在搜索每一行的整个树。这是因为ElementTree不是为了唯一而设计的。

如果您需要的速度（你可能没有，尤其是对于短小的树！），更有效的方法是使用一个defaultdict将其转换为ElementTree之前树形结构存储。

import collections 
import xml.etree.ElementTree as ET 

with open("xmlbasic.txt") as lines_file: 
    lines = lines_file.read() 

root_dict = collections.defaultdict(dict) 
for line in lines: 
    head, subhead, tail = line.split(":") 
    root_dict[head][subhead] = tail 

root = ET.Element('root') 
for head, branch in root_dict.items(): 
    head_element = ET.SubElement(root, head) 
    for subhead, tail in branch.items(): 
     ET.SubElement(head_element,subhead).text = tail 

tree = ET.ElementTree(root) 
ET.dump(tree)

来源

2010-09-21 10:30:40 katrielalex

谢谢，这个和其他答案都很好，但我会坚持defaultdict，因为实际上文本和xml文件相当大。 – bitman 2010-09-21 11:54:26

沿着这些线？你保持分支的水平在字典中重用。

b1map = {} 

for line in allLines: 
    tempv = line.split(':') 
    branch1 = b1map.get(tempv[0]) 
    if branch1 is None: 
     branch1 = b1map[tempv[0]] = ET.SubElement(root, tempv[0]) 
    branch2 = ET.SubElement(branch1, tempv[1]) 
    branch2.text = tempv[2]

来源

2010-09-21 10:13:07 piro

从Python文本文件创建xml树

回答

相关问题