我有这种结构的XML文件:的Python读取XML与相关的子元素
<?DOMParser ?>
<logbook:LogBook xmlns:logbook="http://www/logbook/1.0" version="1.2">
<product>
<serialNumber value="764000606"/>
</product>
<visits>
<visit>
<general>
<startDateTime>2014-01-10T12:22:39.166Z</startDateTime>
<endDateTime>2014-03-11T13:51:31.480Z</endDateTime>
</general>
<parts>
<part number="03081" name="WSSA" index="0016"/>
</parts>
</visit>
<visit>
<general>
<startDateTime>2013-01-10T12:22:39.166Z</startDateTime>
<endDateTime>2013-03-11T13:51:31.480Z</endDateTime>
</general>
<parts>
<part number="02081" name="PSSF" index="0017"/>
</parts>
</visit>
</visits>
</logbook:LogBook>
我想要从这个XML两个输出:
1-参观,包括序列号,所以我写道:
import pandas as pd
import xml.etree.ElementTree as ET
tree = ET.parse(filename)
root=tree.getroot()
visits=pd.DataFrame()
for general in root.iter('general'):
for child in root.iter('serialNumber'):
visits=visits.append({'startDateTime':general.find('startDateTime').text ,
'endDateTime': general.find('endDateTime').text, 'serialNumber':child.attrib['value'] }, ignore_index=True)
这段代码的输出如下数据框:
serialNumber | startDateTime | endDateTime
-------------|------------------------|------------------------|
764000606 |2014-01-10T12:22:39.166Z|2014-03-11T13:51:31.480Z|
764000606 |2013-03-11T13:51:31.480Z|2013-01-10T12:22:39.166Z|
个
2-部分
为parts
,我想有以下输出,在我的startDateTime
相互区分访问的方式,我想显示关联于每次访问的部分:
serialNumber | startDateTime|number|name|index|
-------------|--------------|------|----|-----|
零件我写道:
parts=pd.DataFrame()
for part in root.iter('part'):
for child in root.iter('serialNumber'):
parts=parts.append({'index':part.attrib['index'],
'znumber':part.attrib['number'],
'name': part.attrib['name'], 'serialNumber':child.attrib['value'], 'startDateTime':general.find('startDateTime').text}, ignore_index=True)
这是我从这个代码获得:
index |name|serialNumber| startDateTime |znumber|
------|----|------------|------------------------|-------|
0016 |WSSA| 764000606 |2013-01-10T12:22:39.166Z| 03081 |
0017 |PSSF| 764000606 |2013-01-10T12:22:39.166Z| 02081 |
虽然我想这一点:看startDateTime
:
index |name|serialNumber| startDateTime |znumber|
------|----|------------|------------------------|-------|
0016 |WSSA| 764000606 |2014-01-10T12:22:39.166Z| 03081 |
0017 |PSSF| 764000606 |2013-01-10T12:22:39.166Z| 02081 |
任何想法? 我使用XML ElementTree的
不应该''终止标记在文件的末尾?因为_XML_文件应该只包含__one__ _root_节点。 – CristiFati
访问'熊猫数据框? – mzjn
@mzjn yes visit = pandas.DataFrame() – Safariba