2014-09-02 153 views
2

我有一个结构类似这样长的XML文档:解析XML使用Python的ElementTree

<carrierData> 
    <inspections> 
     <inspection inspection_date="2013-01-16" report_state="TX" report_number="TX130G0ELJ05" level="1" time_weight="1"> 
      <drivers> 
       <driver driver_type="Primary Driver" first_name="JOHN" last_name="SMITH" date_of_birth="1962-11-20" license_state="TX" License_number="12345678"/> 
       <driver driver_type="CoDriver"/> 
      </drivers> 
      <vehicles> 
       <vehicle unit="1" vehicle_id_number="2HSCAAXN02C039269" unit_type="Truck Tractor" license_state="TX" license_number="1B13577"/> 
       <vehicle unit="2" vehicle_id_number="1GRAA76228S702393" unit_type="Semi-Trailer" license_state="TX" license_number="X99757"/> 
      </vehicles> 
      <violations> 
       <violation code="393.11" description="No/defective lighting devices/reflective devices/projected" oos="N" time_severity_weight="3" BASIC="Vehicle Maint."/> 
       <violation code="393.53(b)" description="Automatic brake adjuster CMV manufactured on or after 10/20/1994 - air brake" oos="N" time_severity_weight="4" BASIC="Vehicle Maint."/> 
       <violation code="393.47(e)" description="Clamp/Roto-Chamber type brake(s) out of adjustment" oos="N" time_severity_weight="4" BASIC="Vehicle Maint."/> 
       <violation code="396.3(a)(1)" description="Inspection/repair and maintenance parts and accessories" oos="N" time_severity_weight="2" BASIC="Vehicle Maint."/> 
      </violations> 
    </inspection> 

我需要通过检验报告号的列表进行迭代并打印相关联的每个驱动器的第一个和最后一个名字列表中的每个数字。我使用Python的ElementTree解析XML,虽然我没有与下面的代码时收到错误,它没有给我任何结果之一:

import xml.etree.ElementTree as ET 

codes = ['TX3YZ8HQE1X1', 'TX3YAEHQE15W', 'KS00YQ008857', 'TX43D99DAN33', 'NM3267100378', 
     'COPF31000853', 'TX3ZYF0MUQ6F', 'TX3ZFC0MHXLU', 'TX3Z760MGU0H', 'TX3YGG0MUQ1R', 
     'TX3YBD0MUI0A', 'TX3XPF0MKQYG', 'TX3X8F0MHXA7', 'AZ0160001581', 'TX3WC40ADYGZ', 
     'ID6300005350', 'TX3VV50ADUOI', 'TX137S0ELO02', 'UTCE03208119', 'UTCE03208119', 
     'TX3UTG0MJKDL', 'TX3UD60MIJU5', 'TX13690EBI05', 'TX3U4E0AFA94', 'TX3U4E0AFA94', 
     'TX3T5F0MIJMH', 'TX13550BKL02', 'TX3SLE0MIJGZ', 'TX3SLE0MIJGZ', 'TX3S8D0AFH3D', 
     'UTCE03207947', 'TX133Q0ENG01', 'TX133Q0ENG01', 'TX133Q0ENG01', 'TX3REM0MHEK3', 
     'ID0000169042', 'COPF05000200', 'TX13280EPV0B', 'TX131S9DAB02', 'CO1E19000017', 
     'TX3PD60WAA4L', 'TX1317W1NW07', 'CO2D02000044', 'LALAEQ001266', 'TX130H0EBT06', 
     'TX3NW10ABLMK', 'NV7233010192', 'NV4045000998', 'CO3301000406', 'CO5C01000218', 
     'TX12949DBU03', 'FL1619000314', 'TX12929DIE02', 'TX128X0AAP01', 'TX128A9DHA07', 
     'CO2B01000061', 'TX1274W1DV01', 'TX126Z9DCM01', 'TX127U9DBV01', 'TX127U9DBV01', 
     'TX127R9DIZ02', 'TX127K9DCQ06', 'AZ0YDG000141', 'NV7196001031', 'TX126B0FJZ01', 
     'TX126I9DAN01', 'LALACV003777', 'CO2B12000014', 'TX12650HTB01', 'ID0000220955'] 

tree = ET.parse("C:\All_BASICs_07-25-2014.xml") 
root = tree.getroot() 

for x in codes: 
    for node in tree.iter('inspection'): 
     if ['report_id'] == [x]: 
      name = node.attrib.get('first_name','last_name') 
      print name 

我是一名编程新手,所以我可能会丢失这里有一些显而易见的东西,但没有任何错误可供参考,我在追查问题时遇到了困难。

回答

0

你对这条线做了什么?

if ['report_id'] == [x]: 

有了这个代码,你正在测试['report_id'] == ['TX3YZ8HQE1X1']['report_id'] == ['TX3YAEHQE15W']等,这些将永远是正确的。所以这就是为什么你的代码正在退出而没有打印任何内容或发生错误。

您发布的XML中没有任何名为report_id的内容,您的意思是report_number

如果你想抓住主要的驾驶员的名字在codes列表中的所有report_number的,尝试这样的事情:

for x in codes: 
    for node in tree.iter('inspection'): 
     if node.attrib['report_number'] == x: 
      primary_driver = [d for d in node.iter('driver') if d.attrib['driver_type'] == "Primary Driver"] 
      primary_driver = primary_driver[0] 
      first_name = primary_driver.attrib['first_name'] 
      last_name = primary_driver.attrib['last_name'] 
      print first_name, last_name 

然而,有与此代码一个性能问题。对于codes中的每个代码,您正在遍历整个XML文档。这有complexityO(number_of_codes * number_of_records)这是O(N**2)。您可以在步骤O(N)中执行此操作,而不是在文档上循环一次,然后使用set确定是否应包含记录。

+0

谢谢,先生!那样做了!我在代码中确实拥有正确的'report_number'属性,但当我第一次输入时,我认为在那里有'report_id',所以很抱歉在那里发生混乱。否则,它给了我所需要的东西,看到这里的答案让我更准确地理解了我正在尝试做什么。上面也提到了使用set()函数,并且我确实实现了这一效果。再次感谢! – jerodestapa 2014-09-02 16:56:57