我试图写一个简单的脚本到CSV输出文件从Fortran代码转换成数据帧的熊猫对象,所以我可以做更多的分析,从长格式操纵的CSV文件。该CSV有两列,而是由与形状数据的多个所附块的[N,2](每个样品名称具有格式RN_x)。我得到了以下代码,但生成的DataFrame对象不允许分析。我还附上了一个示例文件(大大缩短了原文)。顺便说一下在数据文件中的第一列是指是日期,但在输出是对应于一天中的SI = imulation一个数字。任何意见将不胜感激。使用numpy的或熊猫
import numpy as np
import pandas as pd
import csv as csv
readdata = csv.reader(open('C:/data/Test.csv', 'r'))
data = []
for row in readdata:
data.append(row)
a = np.array(data).reshape(11,-1, order = 'F')
col = a[0,:4].reshape(4)
row = pd.Index(a[4:,0:1].reshape(7))
b = a[4:,5:]
df = pd.DataFrame(b, index = row, columns = col)
样品:
RN_48865,
1,Observed
1,0
259,Computed
1,0.000014
91,0.000014
182,0.000014
274,0.000014
366,0.000014
457,0.000014
548,0.000014
RN_7445,
1,Observed
1,0
259,Computed
1,0.000013
91,0.000013
182,0.000013
274,0.000013
366,0.000013
457,0.000013
548,0.000013
RN_9288,
1,Observed
1,0
259,Computed
1,0.000011
91,0.000011
182,0.000011
274,0.000011
366,0.000011
457,0.000011
548,0.000011
RN_10955,
1,Observed
1,0
259,Computed
1,0.000014
91,0.000014
182,0.000014
274,0.000014
366,0.000014
457,0.000014
548,0.000014
输出示例:
Index,RN_48865,RN_7445,RN_9288,RN_10955
1,0.000014,0.000013,0.000011,0.000014
91,0.000014,0.000013,0.000011,0.000014
182,0.000014,0.000013,0.000011,0.000014
274,0.000014,0.000013,0.000011,0.000014
366,0.000014,0.000013,0.000011,0.000014
457,0.000014,0.000013,0.000011,0.000014
548,0.000014,0.000013,0.000011,0.000014
那么,什么是问题? – cyborg
对不起,不清楚。如何打开的长文件到一个数据帧的对象与(其将所述数目的基准日期解析日期,例如1995年1月1日;第一数据列)的索引从与所述第二柱填充数据,和多列“RN_x”标签作为列标签。原始长文件具有重复的表示在不同“位置”处的输出的重复数据块。我希望能够分析每个位置的统计信息。 – user2989613
我不明白RN_x的“填充了与第二列数据的多个列‘’标签作为列的标签。”你为什么不简单地显示数据(用'\ n's)? – cyborg