2016-11-15 330 views
1

我需要每天检查一下索引值的列表,为了方便阅读,我把它们放到了一个DataFrame中。我使用Python 2.7将循环中的结果写入python的数据框中

首先,我输出我的回答到一个列表:

index_list = [df1,df2,df3,df4,df5,df6,df7] 
value_list = [20,22,28,29,30,31,32,33] 
myarray = [] 

def minimum(dataframe,value): 
    return dataframe['Datetime'][(dataframe["IDXType"] == value)].min() 

for i in index_list: 
    for value_i in value_list: 
     myarray.append(minimum(i,value_i)) 

这将输出56个镜头长长的清单,然后我把它的数据帧,手动。

result = {'df1':pd.Series(myarray[0:8], index=value_list), 
    'df2':pd.Series(myarray[8:16], index=value_list), 
    'df3':pd.Series(myarray[16:24], index=value_list), 
    'df4':pd.Series(myarray[24:32], index=value_list), 
    'df5':pd.Series(myarray[32:40], index=value_list), 
    'df6':pd.Series(myarray[40:48], index=value_list), 
    'df7':pd.Series(myarray[48:56], index=value_list), 
    } 
result = pd.DataFrame(result) 
result 

它显示8 * 7数据帧。像下面这样:

Expected Result 我想问一下这个程序是否有捷径? 像,直接把我的结果从循环到数据框?

我的清单不断增长,因此我无法每隔一天修复我的代码。

+0

'index_list'是'与列DataFrames'的''list'日期时间'和'IDXType'? – jezrael

+0

index是包含列的DataFrame的列表。 Datetime和IDXType是我必须在原始源数据框中检查的两列。 –

回答

0

您可以使用:

df1 = pd.DataFrame({'Datetime':pd.date_range('2015-01-04','2015-01-08'), 
        'IDXType':[20,20,33,33,33]}) 

print (df1) 
    Datetime IDXType 
0 2015-01-04  20 
1 2015-01-05  20 
2 2015-01-06  33 
3 2015-01-07  33 
4 2015-01-08  33 

df2 = pd.DataFrame({'Datetime':pd.date_range('2015-01-04','2015-01-08'), 
        'IDXType':[30,30,21,21,10]}) 

print (df2) 
    Datetime IDXType 
0 2015-01-04  30 
1 2015-01-05  30 
2 2015-01-06  21 
3 2015-01-07  21 
4 2015-01-08  10 

df3 = pd.DataFrame({'Datetime':pd.date_range('2015-01-04','2015-01-08'), 
        'IDXType':[20,20,30,31,31]}) 

print (df3) 
    Datetime IDXType 
0 2015-01-04  20 
1 2015-01-05  20 
2 2015-01-06  30 
3 2015-01-07  31 
4 2015-01-08  31 
index_list = [df1,df2,df3] 
value_list = [20,22,28,29,30,31,32,33] 
myarray = [] 
def minimum(dataframe,value): 
    return dataframe.loc[dataframe["IDXType"] == value, 'Datetime'].min() 
for i in index_list: 
    for value_i in value_list: 
     myarray.append(minimum(i,value_i)) 
#print (myarray)   

result = { 
'df1':pd.Series(myarray[0:8], index=value_list), 
'df2':pd.Series(myarray[8:16], index=value_list), 
'df3':pd.Series(myarray[16:24], index=value_list) 
} 
result = pd.DataFrame(result) 
print (result) 
      df1  df2  df3 
20 2015-01-04  NaT 2015-01-04 
22  NaT  NaT  NaT 
28  NaT  NaT  NaT 
29  NaT  NaT  NaT 
30  NaT 2015-01-04 2015-01-06 
31  NaT  NaT 2015-01-07 
32  NaT  NaT  NaT 
33 2015-01-06  NaT  NaT 

我的解决方案与groupby和聚合minconcatreindex和最后删除index namerename_axis(新中pandas0.18.0):

print (df1.groupby('IDXType')['Datetime'].min()) 
IDXType 
20 2015-01-04 
33 2015-01-06 
Name: Datetime, dtype: datetime64[ns] 

df = pd.concat([df1.groupby('IDXType')['Datetime'].min(), 
       df2.groupby('IDXType')['Datetime'].min(), 
       df3.groupby('IDXType')['Datetime'].min()], 
       axis=1, 
       keys=('df1','df2','df3')).reindex(value_list).rename_axis(None) 
print (df)  
      df1  df2  df3 
20 2015-01-04  NaT 2015-01-04 
22  NaT  NaT  NaT 
28  NaT  NaT  NaT 
29  NaT  NaT  NaT 
30  NaT 2015-01-04 2015-01-06 
31  NaT  NaT 2015-01-07 
32  NaT  NaT  NaT 
33 2015-01-06  NaT  NaT 

您还可以使用更动态的解决方案 - 在concat使用list comprehension,但需要在新df5添加新的名单列名:

index_list = [df1,df2,df3] 
value_list = [20,22,28,29,30,31,32,33] 
namesdf = ['df1','df2','df3'] 
df5 = pd.concat([x.groupby('IDXType')['Datetime'].min() for x in index_list], 
       axis=1, 
       keys=namesdf).reindex(value_list).rename_axis(None) 
print (df5) 
      df1  df2  df3 
20 2015-01-04  NaT 2015-01-04 
22  NaT  NaT  NaT 
28  NaT  NaT  NaT 
29  NaT  NaT  NaT 
30  NaT 2015-01-04 2015-01-06 
31  NaT  NaT 2015-01-07 
32  NaT  NaT  NaT 
33 2015-01-06  NaT  NaT 
+0

感谢reindex和concat的想法。我遇到的最大问题是如何直接写入数据框,而不是转换为现有数据(这意味着我必须每天修改数据框的大小/名称等)。我需要帮助:loop-> list-> dataframe to loop-> dataframe。 –

+0

嗯,但你为什么需要循环?熊猫是最好的避免所有循环。看来我不明白为什么我的解决方案不好,你能解释一下吗? – jezrael

+0

好吧,我必须从数据框中读取数据帧(在index_list中),以便在该列等于特定值(在value_list中)时在每个特定列(此处为“IDXType”)中查找最小值...而且我不知道其他方法,因此我使用嵌套循环...这可能是一个坏主意,您是否有任何其他方法? –