2015-12-23 31 views
1

之间的范围内,我有两个大熊猫据帧(DF1和DF2):遍历日期两只大熊猫dataframes的类别数

DF1有12列,其中A1,A2,...,A9是空列。以下是df1的示例:

Stock Start_Date   End_Date  a1 a2 a3 a4 .... a9 
A 09-12-2015 20:04 10-12-2015 23:04     
B 09-12-2015 10:04 09-12-2015 20:14     
A 11-12-2015 00:22 11-12-2015 08:04     
C 08-12-2015 06:56 10-12-2015 20:54     

df2有4列。下面是一个示例:

Stock date_time  Opening closing 
A 09-12-2015 21:24 144.3 10 
A 09-12-2015 21:27 225.51 24 
B 09-12-2015 10:20 134.42 11 
A 09-12-2015 20:04 231.22 17 
B 09-12-2015 10:24 399.55 32 
A 09-12-2015 20:04 246.77 21 
B 09-12-2015 14:22 76.23 8 
C 08-12-2015 09:44 232.22 15 
C 09-12-2015 20:04 222.91 12 
A 11-12-2015 02:06 93.21 7 
B 09-12-2015 20:04 211.36 26 
C 09-12-2015 20:04 111.21 8 

现在,我想的输出是这样,DF1:

Stock Start_Date  End_Date   a1 a2 a3 a4 ....a9 
A 09-12-2015 20:04 10-12-2015 23:04 0 2 2 0  0 
B 09-12-2015 10:04 09-12-2015 20:14 1 1 2 0  0 
A 11-12-2015 00:22 11-12-2015 08:04 1 0 0 0  0 
C 08-12-2015 06:56 10-12-2015 20:54 0 0 0 1  0 

即对DF1的每一个股票,START_DATE & END_DATE组合,结果应该具有计数在该日期时间范围内的每个类别的df2。在此最终输出中,a1 =计数[开放(0-100)&结束(0-10)],a2 =计数[开放(101-200)&结束(11-20)],a3 =计数[开幕(201-400)&闭幕(21-50)],a4 =开幕(0-100)&闭幕(11-20)]等等,全部9个组合。

我对此有R代码,但对于更大的数据集效果不佳。任何人都可以帮助我如何在python/pandas中做到这一点。任何帮助表示赞赏!

回答

1

你可以试试这个解决方案,在那里我删除的df1空列,但他们太工作:

#merge dataframes by Stock, select datetimes between start and end 
df = df1.merge(df2,on='Stock', how='left') 
df = df[(df.date_time >= df.Start_Date) & (df.date_time <= df.End_Date)] 
#remove column date_time 
df = df.drop(['date_time'], axis=1) 
print df 
# Stock   Start_Date   End_Date Opening closing 
#0  A 2015-09-12 20:04:00 2015-10-12 23:04:00 144.30  10 
#1  A 2015-09-12 20:04:00 2015-10-12 23:04:00 225.51  24 
#2  A 2015-09-12 20:04:00 2015-10-12 23:04:00 231.22  17 
#3  A 2015-09-12 20:04:00 2015-10-12 23:04:00 246.77  21 
#5  B 2015-09-12 10:04:00 2015-09-12 20:14:00 134.42  11 
#6  B 2015-09-12 10:04:00 2015-09-12 20:14:00 399.55  32 
#7  B 2015-09-12 10:04:00 2015-09-12 20:14:00 76.23  8 
#8  B 2015-09-12 10:04:00 2015-09-12 20:14:00 211.36  26 
#13  A 2015-11-12 00:22:00 2015-11-12 08:04:00 93.21  7 
#14  C 2015-08-12 06:56:00 2015-10-12 20:54:00 232.22  15 
#15  C 2015-08-12 06:56:00 2015-10-12 20:54:00 222.91  12 
#16  C 2015-08-12 06:56:00 2015-10-12 20:54:00 111.21  8 

#values to new columns by conditions - cast boolean to integers 
df['a1'] = ((df.Opening.between(0,100)) & (df.closing.between(0,10))).astype(int) 
df['a2'] = ((df.Opening.between(100,200)) & (df.closing.between(11,20))).astype(int) 
#add other columns like a1 and a2 
print df 
# Stock   Start_Date   End_Date Opening closing a1 a2 
#0  A 2015-09-12 20:04:00 2015-10-12 23:04:00 144.30  10 0 0 
#1  A 2015-09-12 20:04:00 2015-10-12 23:04:00 225.51  24 0 0 
#2  A 2015-09-12 20:04:00 2015-10-12 23:04:00 231.22  17 0 0 
#3  A 2015-09-12 20:04:00 2015-10-12 23:04:00 246.77  21 0 0 
#5  B 2015-09-12 10:04:00 2015-09-12 20:14:00 134.42  11 0 1 
#6  B 2015-09-12 10:04:00 2015-09-12 20:14:00 399.55  32 0 0 
#7  B 2015-09-12 10:04:00 2015-09-12 20:14:00 76.23  8 1 0 
#8  B 2015-09-12 10:04:00 2015-09-12 20:14:00 211.36  26 0 0 
#13  A 2015-11-12 00:22:00 2015-11-12 08:04:00 93.21  7 1 0 
#14  C 2015-08-12 06:56:00 2015-10-12 20:54:00 232.22  15 0 0 
#15  C 2015-08-12 06:56:00 2015-10-12 20:54:00 222.91  12 0 0 
#16  C 2015-08-12 06:56:00 2015-10-12 20:54:00 111.21  8 0 0 

#groupby and sum rows 
df= df.groupby(['Stock', 'Start_Date', 'End_Date']).sum() 
df = df.drop(['Opening', 'closing'], axis=1) 
print df.reset_index() 
# Stock   Start_Date   End_Date a1 a2 
#0  A 2015-09-12 20:04:00 2015-10-12 23:04:00 0 0 
#1  A 2015-11-12 00:22:00 2015-11-12 08:04:00 1 0 
#2  B 2015-09-12 10:04:00 2015-09-12 20:14:00 1 1 
#3  C 2015-08-12 06:56:00 2015-10-12 20:54:00 0 0 
+0

它是如何工作的? – jezrael

+0

谢谢,作品完美无瑕。还有一件事,如果我在df1中有另一列(double或float)。通过更改合并中的“如何”,可以在最终输出中获得该结果吗? – warwick12

+1

我认为功能'merge'中的'on'用于匹配 - 更好的示例与图片是[here](http://pandas.pydata.org/pandas-docs/stable/merging.html#brief-primer-on-合并的方法关系代数)。 'df = df1.merge(df2,on ='Stock',how ='left')'与'df = pd.merge(df1,df2,on ='Stock',how ='left')'相同。 – jezrael