我已经继承了一些我需要在Python中处理的xml。我正在使用xml.etree.cElementTree
,我在将空元素后面的文本与空元素的标记关联时遇到了一些问题。这个xml比我下面粘贴的要复杂得多,但我简化了它,使问题更加清晰(我希望!)。如何将xml文本与Python中前面的空元素相关联?
我想有其结果是这样一个字典:
期望结果
{(9, 1): 'As they say, A student has usually three maladies:', (9, 2): 'poverty, itch, and pride.'}
元组还可以包含字符串(例如,('9', '1')
)。我真的不在乎这个早期阶段。
这里是XML:
test1.xml
<div1 type="chapter" num="9">
<p>
<section num="1"/> <!-- The empty element -->
As they say, A student has usually three maladies: <!-- Here lies the trouble -->
<section num="2"/> <!-- Another empty element -->
poverty, itch, and pride.
</p>
</div1>
我曾尝试
尝试1
>>> import xml.etree.cElementTree as ET
>>> tree = ET.parse('test1.xml')
>>> root = tree.getroot()
>>> chapter = root.attrib['num']
>>> d = dict()
>>> for p in root:
for section in p:
d[(int(chapter), int(section.attrib['num']))] = section.text
>>> d
{(9, 2): None, (9, 1): None} # This of course makes sense, since the elements are empty
尝试2
>>> for p in root:
for section, text in zip(p, p.itertext()): # unfortunately, p and p.itertext() are two different lengths, which also makes sense
d[(int(chapter), int(section.attrib['num']))] = text.strip()
>>> d
{(9, 2): 'As they say, A student has usually three maladies:', (9, 1): ''}
正如你可以在后面的尝试看,p
和p.itertext()
是两个不同的长度。 (9, 2)
的值是我试图与关键字(9, 1)
关联的值,而我想与(9, 2)
关联的值甚至没有出现在d
中(因为zip
截断了较长的p.itertext()
)。
任何帮助,将不胜感激。提前致谢。
辉煌。像魅力一样工作。谢谢。 – user3079064