您的问题两个不同的问题:
- 从字典其中的值是容器,而不是原语创建一个CSV文件。
对于第一个问题,解决方案通常是将容器类型转换为基本类型。最常用的方法是创建一个json字符串。因此,例如:
>>> import json
>>> x = [2, 4, 6, 8, 10]
>>> json_string = json.dumps(x)
>>> json_string
'[2, 4, 6, 8, 10]'
所以你的数据转换可能看起来像:
import json
def convert(datadict):
'''Generator which converts a dictionary of containers into a dictionary of json-strings.
args:
datadict(dict): dictionary which needs conversion
yield:
tuple: key and string
'''
for key, value in datadict.items():
yield key, json.dumps(value)
def dump_to_csv_using_dict(datadict, fields=None, filepath=None, delimiter=None):
'''Dumps a datadict value into csv
args:
datadict(list): list of dictionaries to dump
fieldnames(list): field sequence to use from the dictionary [default: sorted(datadict.keys())]
filepath(str): filepath to save to [default: 'tmp.csv']
delimiter(str): delimiter to use in csv [default: '|']
'''
fieldnames = sorted(datadict.keys()) if fields is None else fields
filepath = 'tmp.csv' if filepath is None else filepath
delimiter = '|' if not delimiter else delimiter
with open(filepath, 'w') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames, restval='', extrasaction='ignore', delimiter=delimiter)
writer.writeheader()
for each_dict in datadict:
writer.writerow(each_dict)
那么天真的转换是这样的:
# Conversion code
test_data = {
"review_id": [1, 2, 3, 4],
"text": [5, 6, 7, 8]}
}
converted_data = dict(convert(test_data))
data_list = [converted_data]
dump_to_csv(data_list)
- 创建一个实际上是两种不同数据集的某种合并的最终值。
要做到这一点,您需要找到一种方法来组合来自不同键的数据。这通常不是一个容易解决的问题。
这就是说,它很容易两个清单合并拉链。
>>> x = [2, 4, 6]
>>> y = [1, 3, 5]
>>> zip(y, x)
[(1, 2), (3, 4), (5, 6)]
此外,在事件你的列表是不一样的大小,python的itertools包提供了一个方法,izip_longest,这将产生回全拉链即使一个列表比另一个短。注意izip_longest返回一个生成器。
from itertools import izip_longest
>>> x = [2, 4]
>>> y = [1, 3, 5]
>>> z = izip_longest(y, x, fillvalue=None) # default fillvalue is None
>>> list(z) # z is a generator
[(1, 2), (3, 4), (5, None)]
因此,我们可以在这里添加其他功能:
from itertoops import izip_longest
def combine(data, fields=None, default=None):
'''Combines fields within data
args:
data(dict): a dictionary with lists as values
fields(list): a list of keys to combine [default: all fields in random order]
default: default fill value [default: None]
yields:
tuple: columns combined into rows
'''
fields = data.keys() if field is None else field
columns = [data.get(field) for field in fields]
for values in izip_longest(*columns, fillvalue=default):
yield values
现在我们可以用它来更新我们的初始转换。
def dump_to_csv(data, filepath=None, delimiter=None):
'''Dumps list into csv
args:
data(list): list of values to dump
filepath(str): filepath to save to [default: 'tmp.csv']
delimiter(str): delimiter to use in csv [default: '|']
'''
fieldnames = sorted(datadict.keys()) if fields is None else fields
filepath = 'tmp.csv' if filepath is None else filepath
delimiter = '|' if not delimiter else delimiter
with open(filepath, 'w') as csvfile:
writer = csv.writer(csvfile, delimiter=delimiter)
for each_row in data:
writer.writerow(each_dict)
# Conversion code
test_data = {
"review_id": [1, 2, 3, 4],
"text": [5, 6, 7, 8]}
}
combined_data = combine(test_data)
data_list = [combined_data]
dump_to_csv(data_list)
我相信'DictWriter'需要一个'dict'列表而不是'list'的'dict'列表。在这里看到的例子:https://docs.python.org/2/library/csv.html#csv.DictWriter – FamousJameous