2011-02-16 162 views
1

我需要一些帮助来找到一个教程或示例,以获取列表理解并将其与来自csv的数据文件进行合并,并将所有这些转换为xml文件。从阅读各种Python书籍& pdfs像ditp,IYOCGwP,learnpythonthe hardway ,, lxml tut,认为python和在线搜索我大部分的方式在那里,所以我认为。我只需要推动将所有东西捆绑在一起。我基本上采取了一个excel电子表格,我将其导出为csv文件。 csv包含我需要映射到xml文件的记录行。我对Python很陌生,以为我会用我的小项目来学习这门语言。列出的代码并不漂亮,但有效。我可以读取一个csv文件并将其转储到列表中。我可以合并3个列表并输出结果列表,我可以让我的程序吐出一个几乎按我需要的格式布置的骨架xml。我将列出一个小样本的实际输出,以及我正在尝试使用此代码下面的xml完成​​的内容。对不起,如果这太冗长了,这是我的第一篇文章。如何将python列表理解转换为xml

import csv, datetime, os 
from lxml import etree 
from ElementTree_pretty import prettify 

f = os.path.getsize("SO.csv") 
fh = "SO.csv" 
rh = open(fh, "rU") 

rows = 0 
try: 
    rlist = csv.reader(rh) 
    reports = [] 
    for row in rlist: 
     '''print row.items()''' 
     rowStripped = [x.strip(' ') for x in row] 
     reports.append(rowStripped) 
     rows +=1 
except csv.Error, e: 
    sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e)) 

finally: 
    rh.close() 

root = etree.Element("co_ehs") 
object = etree.SubElement(root, "object") 
event = etree.SubElement(object, "event") 
facets = etree.SubElement(event, "facets") 
categories = etree.SubElement(facets, "categories") 
instance = etree.SubElement(categories, "instance") 
property = etree.SubElement(instance, "property") 

facets = ['header','header','header','header','informational','header','informational'] 

categories =  ['processing','processing','processing','processing','short_title','file_num','short_narrative'] 

property = ['REPORT ID','NEXT REPORT ID','initial-event-date','number','title','summary-docket-num','description-story'] 

print('----------Printing Reports from CSV Data----------') 
print reports 
print('---------END OF CSV DATA-------------') 
print 
mappings = zip(facets, categories, property) 
print('----------Printing Mappings from the zip of facets, categories, property ----------') 
print mappings 
print('---------END OF List Comprehension-------------') 
print 
print('----------Printing the xml skeleton that will contain the mappings and the csv data ----------') 
print(etree.tostring(root, xml_declaration=True, encoding='UTF-8', pretty_print=True)) 
print('---------END OF XML Skeleton-------------') 


----My OUTPUT--- 
----------Printing Reports from CSV Data---------- 
[['1', '12-Dec-04', 'Vehicle Collision', '786689', 'No fault collision due to ice', '-1', '545671'], ['3', '15-Dec-04', 'OJT Injury', '87362', 'Paint fumes combusted causing 2nd degree burns', '4', '588456'], ['4', '17-Dec-04', 'OJT Injury', '87362', 'Paint fumes combusted causing 2nd degree burns', '-1', '58871'], ['1000', '12-Nov-05', 'Back Injury', '9854231', 'Lifting without a support device', '-1', '545671'], ['55555', '12-Jan-06', 'Foot Injury', '7936547', 'Office injury - heavy item dropped on foot', '-1', '545671']] 
---------END OF CSV DATA------------- 
----------Printing Mappings from the zip of facets, categories, property ---------- 
[('header', 'processing', 'REPORT ID'), ('header', 'processing', 'NEXT REPORT ID'), ('header', 'processing', 'initial-event-date'), ('header', 'processing', 'number'), ('informational', 'short_title', 'title'), ('header', 'file_num', 'summary-docket-num'), ('informational', 'short_narrative', 'description-story')] 
---------END OF List Comprehension------------- 
----------Printing the xml skeleton that will contain the mappings and the csv data ---------- 

    <?xml version='1.0' encoding='UTF-8'?> 
    <co_ehs> 
     <object> 
     <event> 
      <facets> 
      <categories> 
       <instance> 
       <property/> 
       </instance> 
      </categories> 
      </facets> 
     </event> 
     </object> 
</co_ehs> 

---------END OF XML Skeleton------------- 
----------CSV DATA------------------ 
C_ID,NEXT_C_ID,C_DATE,C_NUMBER,C_EVENT,C_DOCKETNUM,C_DESCRIPTION 
1,-1,12-Dec-04,545671,Vehicle Collision,786689,"No fault collision due to ice" 
3,4,15-Dec-04,588456,OJT Injury,87362,"Paint fumes combusted causing 2nd degree burns" 
4,-1,17-Dec-04,58871,OJT Injury,87362,"Paint fumes combusted causing 2nd degree burns" 
1000,-1,12-Nov-05,545671,Back Injury,9854231,"Lifting without a support device" 
55555,-1,12-Jan-06,545671,Foot Injury,7936547,"Office injury - heavy item dropped on foot" 

-----------What I want the xml output to look like---------------------- 
    <?xml version="1.0" encoding="UTF-8"?> 
    <co_ehs xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="co_ehs.xsd"> 
     <object id="3" object-type="ehs_report"> 
     <event event-tag="0"> 
      <facets name="header"> 
      <categories name="processing"> 
       <instance instance-tag="0"> 
       <property name="REPORT ID" value="1"/> 
       <property name="NEXT REPORT ID" value="-1"/> 
       <property name="initial-event-date" value="12-Dec-04"/> 
       <property name="number" value="545671"/> 
       </instance> 
      </categories> 
      </facets> 
      <facets name="informational"> 
      <categories name="short_title"> 
       <instance-tag="0"> 
       <property name="title" value="Vehicle Collision"/> 
       </instance> 
      </categories> 
      </facets> 
      <facets name="header"> 
      <categories name="file_num"> 
       <instance-tag="0"> 
       <property name="summary-docket-num" value="786689"/> 
       </instance> 
      </categories> 
      </facets> 
      <facets name="informational"> 
      <categories name="short_narrative"> 
       <instance-tag="0"> 
       <property name="description-story" value="No fault collision due to ice"/> 
       </instance> 
      </categories> 
      </facets> 
     </event> 
     </object> 
    </co_ehs> 
+0

什么是对象的id属性和事件的事件标签属性规则的左侧最好的答案?我认为事件标记只是一个计数器? – 2011-02-22 18:27:44

+0

@ ocaso-protal对象的id属性是我理解的那个记录的唯一id或整数。该对象类型的下一个记录将等于或大于4.我相信事件标记以相似的方式使用,当它插入rdms时,为每个事件标记赋予一个唯一的标识。我一直负责将csv文件和序列化到符合模式的xml中,以便最终xml将被馈送到rdms中。我想知道如何遍历映射并在生成整个xml文档之前将每个映射插入树中。 – MWR 2011-02-23 13:12:53

+0

@ ocaso-protal我将csv从字典转换为列表,因此我的目标是遍历每个映射并生成相应的标记,并遍历整个列表,即所有csv数据并将适当的数据项插入到标签。每个数据项都有一个构面,类别,实例标签和属性。 – MWR 2011-02-23 13:29:30

回答

0

这是我的解决方案。我使用lxml,因为使用框架生成XML通常比使用字符串或模板文件更好。

缺少co_ehs的属性,但这可以很容易地用一些set() -calls修复。我把它留给你做这件事。

BTW:你可以接受通过单击选中标记的答案

import csv, datetime, os 
from lxml import etree 

def makeFacet(event, newheaders, ev, facetname, catname, count, nhposstart, nhposend): 
    facets = etree.SubElement(event, "facets", name=facetname) 
    categories = etree.SubElement(facets, "categories", name=catname) 
    instance = etree.SubElement(categories, "instance") 
    instance.set("instance-tag", count) 

    for i in range(nhposstart, nhposend): 
     property = etree.SubElement(instance, "property") 
     property.set("name", newheaders[i]) 
     property.set("value", ev[i].strip()) 


# read the csv 
fh = "SO.csv" 
rh = open(fh, "rU") 

try: 
    rlist = list(csv.reader(rh)) 
except csv.Error as e: 
    sys.exit("file %s, line %d: %s" % (filename, reader.line_num, e)) 
finally: 
    rh.close() 

# generate the xml 

# newheaders is a mapping of the csv column names, because they don't correspondent w/ the XML 
newheaders = ["REPORT_ID","NEXT_REPORT_ID","initial-event-date","number","title","summary-docket-num", "description-story"] 

root = etree.Element("co_ehs") 

object = etree.SubElement(root, "object") 

object.set("id", "3") # Not sure about this one 
object.set("object-type", "ehs-report") 

for c, ev in enumerate(rlist[1:]): 
    event = etree.SubElement(object, "event") 
    event.set("event-tag", "%s"%c) 
    makeFacet(event, newheaders, ev, "header", "processing", "%s"%c, 0, 4) 
    makeFacet(event, newheaders, ev, "informational", "short-title", "%s"%c, 4, 5) 
    makeFacet(event, newheaders, ev, "header", "file_num", "%s"%c, 5, 6) 
    makeFacet(event, newheaders, ev, "informational", "short_narrative", "%s"%c, 6, 7) 

print(etree.tostring(root, xml_declaration=True, encoding="UTF-8", pretty_print=True)) 
0

我创建了名称的文件'pattern.txt'及以下内容(此缩进)。

请注意8 %s放置在战略位置。

 <event event-tag="%s"> 
      <facets name="header"> 
      <categories name="processing"> 
       <instance instance-tag="0"> 
       <property name="REPORT ID" value="%s"/> 
       <property name="NEXT REPORT ID" value="%s"/> 
       <property name="initial-event-date" value="%s"/> 
       <property name="number" value="%s"/> 
       </instance> 
      </categories> 
      </facets> 
      <facets name="informational"> 
      <categories name="short_title"> 
       <instance-tag="0"> 
       <property name="title" value="%s"/> 
       </instance> 
      </categories> 
      </facets> 
      <facets name="header"> 
      <categories name="file_num"> 
       <instance-tag="0"> 
       <property name="summary-docket-num" value="%s"/> 
       </instance> 
      </categories> 
      </facets> 
      <facets name="informational"> 
      <categories name="short_narrative"> 
       <instance-tag="0"> 
       <property name="description-story" value="%s"/> 
       </instance> 
      </categories> 
      </facets> 
     </event> 

我创建的文件'SO.csv'与如下因素的内容:

C_ID,NEXT_C_ID,C_DATE,C_NUMBER,C_EVENT,C_DOCKETNUM,C_DESCRIPTION 
1,-1,12-Dec-04,545671,Vehicle Collision,786689,"No fault collision due to ice" 
3,4,15-Dec-04,588456,OJT Injury,87362,"Paint fumes combusted causing 2nd degree burns" 
4,-1,17-Dec-04,58871,OJT Injury,87362,"Paint fumes combusted causing 2nd degree burns" 
1000,-1,12-Nov-05,545671,Back Injury,9854231,"Lifting without a support device" 
55555,-1,12-Jan-06,545671,Foot Injury,7936547,"Office injury - heavy item dropped on foot" 

,我跑了下面的代码:

import csv 

rid = csv.reader(open('SO.csv','rb')) 
rid.next() 

with open('pattern.txt') as f: 
    pati = f.read() 

xmloutput = [' <?xml version="1.0" encoding="UTF-8"?>', 
      ' <co_ehs xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" '\ 
      'xsi:noNamespaceSchemaLocation="co_ehs.xsd">', 
      '  <object id="3" object-type="ehs_report">'] 

for i,row in enumerate(rid): 
    row[0:0] = str(i) 
    xmloutput.append(pati % tuple(row)) 

print '\n'.join(xmloutput) 

这是否帮助你吗?