2017-02-13 127 views
1

考虑这个数据帧:组行通过分配值作为大熊猫数据帧列

import pandas as pd 
    df = pd.DataFrame({ 
    'id': [458,459,464,469,507,512,516,519,519,615] 
}) 

我想找到第二行的区别 - 第一排,所以我实现:

df['diff'] = df['id'] - df['id'].shift(-1) 
df.fillna(1) 

    id diff 
0 458 -1.0 
1 459 -5.0 
2 464 -5.0 
3 469 -38.0 
4 507 -5.0 
5 512 -4.0 
6 516 -3.0 
7 519 0.0 
8 519 -96.0 
9 615 1.0 

现在我想以这样一种方式将这些列组成diff列,只要两行之间的差异是greater than 10,请创建一个新列group并将所有上述行设置为1,依此类推。

正如你可以看到列第4行和第3

id diff group 
0 458 -1.0  1 
1 459 -5.0  1 
2 464 -5.0  1 
3 469 -38.0 1 
4 507 -5.0  2 
5 512 -4.0  2 
6 516 -3.0  2 
7 519 0.0  2 
8 519 -96.0 2 
9 615 1.0  3 

任何想法之间diff diffrence如何实现这一目标?

回答

1

您可以使用diff,比较,然后cumsum布尔面膜,最后加1

print (df['diff'].diff()) 
0  NaN 
1 -4.0 
2  0.0 
3 -33.0 
4 33.0 
5  1.0 
6  1.0 
7  3.0 
8 -96.0 
9 97.0 
Name: diff, dtype: float64 

df['group'] = (df['diff'].diff() > 10).cumsum() + 1 
print (df) 
    id diff group 
0 458 -1.0  1 
1 459 -5.0  1 
2 464 -5.0  1 
3 469 -38.0  1 
4 507 -5.0  2 
5 512 -4.0  2 
6 516 -3.0  2 
7 519 0.0  2 
8 519 -96.0  2 
9 615 1.0  3 

df = df.assign(group=df['diff'].diff().gt(10).cumsum().add(1)) 
print (df) 
    id diff group 
0 458 -1.0  1 
1 459 -5.0  1 
2 464 -5.0  1 
3 469 -38.0  1 
4 507 -5.0  2 
5 512 -4.0  2 
6 516 -3.0  2 
7 519 0.0  2 
8 519 -96.0  2 
9 615 1.0  3 
+0

正常工作先生! – Shubham

+0

超级,我真的很高兴;) – jezrael

+0

我试过的是:'counter = 1 while(df ['diff'] - df ['diff']。shift(-1))。any()<10: df ['group'] =计数器 计数器=计数器+1' 但它进入了一个无限循环:p – Shubham