2017-10-04 162 views
0

多值文件我有一个CSV结构如下文件:阅读CSV每个标签

# Parameters :,   P1, P2, P3 
2012-01-01 01:01:55.000000,1,2,3,4,5,6,6,8,9 
2012-01-01 01:01:56.000000,4,9,2,0,2,1,1,6,8 
... 

如何使用Python的(和可选熊猫)有以下结果来阅读:

{'2012-01-01 01:01:55.000000': {'P1': [1, 2, 3], 'P2': [4, 5, 6], 'P3': [7, 8, 9]}, 
'2012-01-01 01:01:56.000000': {'P1': [4, 9, 2], 'P2': [0, 2, 1], 'P3': [1, 6, 8]}} 

谢谢!

+2

您可以添加更多行,并解释你的输出应看起来像?一行不是很具描述性。 –

+0

1,2,3和4,5,6之间的区别是什么?他们是如何分配给P1或P2的逻辑? – brddawg

+0

@coldspeed:完成 – Nicolargo

回答

0

随着熊猫和numpy的

with open('tst.csv') as f: 
    _, *params = map(str.strip, f.readline().split(',')) 

d1 = pd.read_csv(
    'tst.csv', comment='#', header=None, 
    index_col=0, parse_dates=True) 

i = d1.index.rename(None) 
v = d1.values 
t = v.reshape(v.shape[0], -1, v.shape[1] // len(params)).transpose(1, 0, 2) 

pd.DataFrame(dict(zip(params, t.tolist())), i) 

          P1   P2   P3 
2012-01-01 01:01:55 [1, 2, 3] [4, 5, 6] [6, 8, 9] 
2012-01-01 01:01:56 [4, 9, 2] [0, 2, 1] [1, 6, 8] 

没有

with open('tst.csv') as f: 
    _, *params = map(str.strip, f.readline().split(',')) 
    k = len(params) 
    d = {ts: dict(zip(
     params, 
     (data[i*len(data)//k:(i+1)*len(data)//k] for i in range(k)) 
    )) for ts, *data in map(lambda x: x.strip().split(','), f.readlines())} 

d 

{'2012-01-01 01:01:55.000000': {'P1': ['1', '2', '3'], 
           'P2': ['4', '5', '6'], 
           'P3': ['6', '8', '9']}, 
'2012-01-01 01:01:56.000000': {'P1': ['4', '9', '2'], 
           'P2': ['0', '2', '1'], 
           'P3': ['1', '6', '8']}} 
1

csv.reader随着对象和itertools.islice()功能:

import csv, itertools 

with open('test.csv', 'r') as f: 
    reader = csv.reader(f, delimiter=',', skipinitialspace=True) 
    header = next(reader)[1:] # getting `P<number>` keys 
    d = {} 
    for l in reader: 
     d[l[0]] = {header[i]: list(itertools.islice(l[1:], i*3, i*3+3)) for i in range(len(header))} 

print(d) 

的输出(3个输入行):

{'2012-01-01 01:01:55.000000': {'P2': ['4', '5', '6'], 'P1': ['1', '2', '3'], 'P3': ['6', '8', '9']}, '2012-01-01 01:01:56.000000': {'P2': ['0', '2', '1'], 'P1': ['4', '9', '2'], 'P3': ['1', '6', '8']}} 

。注意,字典在Python是无序的结构。
为了得到一个有序的结构定义结果字典为OrderedDict对象(从collections模块)

... 
d = collections.OrderedDict() 

在这种情况下,结果将是:

OrderedDict([('2012-01-01 01:01:55.000000', {'P1': ['1', '2', '3'], 'P2': ['4', '5', '6'], 'P3': ['6', '8', '9']}), ('2012-01-01 01:01:56.000000', {'P1': ['4', '9', '2'], 'P2': ['0', '2', '1'], 'P3': ['1', '6', '8']})])