2014-11-06 56 views
1

我将一大段文本解析为字典,最终目标是创建一个CSV文件并将其作为列标题。当事先不知道字段时,使用DictWriter写入CSV

csv.DictWriter(csvfile, fieldnames, restval='', extrasaction='raise', dialect='excel', *args, **kwds)

问题出现的字典任何“n行可以包括一个新的,用前所未有的关键。然后我希望CSV也包含这个新密钥的列。总之,我所有的领域都不是事先知道的,所以我不能在开头编译完整的fieldnames

是否有推荐的方法让csv.DictWriter不忽略丢失的字段,而是将它们添加到fieldnames而不是?在这一点上仅仅改变fieldnames会使字段的数量不正确。

+0

能否请您提供一个样本字典结构。 – 2014-11-06 06:37:43

+0

问题是在代码执行之前字典密钥是未知的,但我希望能够从列表的字典中编写CSV。我正在编译整个列表的字典,然后迭代键来识别可用于字段名的唯一键。然而,随着数据集的增长,我希望能够在我知道所有的字典之前编写一个CSV。 – Pranab 2014-11-06 08:41:43

+0

Pranab请在下面查看我的答案。 – 2014-11-06 15:26:35

回答

2

而不是使用DictWriter它可以在你的情况下,混乱的字典是没有顺序的我尝试使用的writerow CSV方法。 这里是我做的:

""" 
a) First took all the keys of dictionary and sorted it, which is not necessary. 
b) Created a result list which appends value related the headers which is key of our input dict and if key is not available then .get() will return None. 
    So result list will contain lists for rows data. 
c) Wrote header and each row from result list in csv file 
""" 

data_dict = [{ "Header_1":"data_1", "Header_2":"data_2", "Header_3":"data_3"}, 
      { "Header_1":"data_4", "Header_2":"data_5", "Header_3":"data_6"}, 
      { "Header_1":"data_7", "Header_2":"data_8", "Header_3":"data_9", "Header_4":"data_10"}, 
      { "Header_1":"data_11", "Header_3":"data_12"}, 
      { "Header_1":"data_13", "Header_2":"data_14", "Header_3":"data_15"}] 

""" 
    In the third dict we have extra key, value. 
    In forth we dont have have header_2 were we aspect blank value in our csv file. 
""" 
process_data = [ [k,v] for _dict in data_dict for k,v in _dict.iteritems() ]   

headers = [ i[0] for i in process_data ] 
headers = sorted(list(set(headers))) 

result = [] 
for _dict in data_dict: 
    row = [] 
    for header in headers: 
     row.append(_dict.get(header, None)) 
    result.append(row) 


import csv 
with open('demo.csv', 'wb') as csvfile: 
    spamwriter = csv.writer(csvfile, delimiter=';', dialect='excel', 
          quotechar='|', quoting=csv.QUOTE_MINIMAL) 
    spamwriter.writerow(headers)  
    for r in result: 
     spamwriter.writerow(r) 

enter image description here

相关问题