2017-04-19 81 views
2

假设数据框如下:如何选择具备一定条件从熊猫数据框中行

id class count 
0  A  2 
0  B  2 
0  C  2 
0  D  1 
1  A  3 
1  B  3 
1  E  2 
2  D  4 
2  F  2 

每个ID,我想找到其计为最大时的等级。如果多个类具有相同的计数,则将它们合并为一行。对于上面的例子,结果应该如下:

id  class count 
0  A,B,C  2 
1  A,B  3 
2  D   4 

如何在pandas中使用语句来实现这个功能?

回答

3

transformaggregate

df = df[g['count'].transform('max').eq(df['count'])] 
print (df) 
    id class count 
0 0  A  2 
1 0  B  2 
2 0  C  2 
4 1  A  3 
5 1  B  3 
7 2  D  4 

df = df.groupby('id').agg({'class':','.join, 'count':'first'}).reset_index() 
print (df) 
    id class count 
0 0 A,B,C  2 
1 1 A,B  3 
2 2  D  4 

具有自定义功能的另一个解决方案:

def f(x): 
    x = x[x['count'] == x['count'].max()] 
    return (pd.Series([','.join(x['class'].values.tolist()), x['count'].iat[0]], 
         index=['class','count'])) 

df = df.groupby('id').apply(f).reset_index() 
print (df) 
    id class count 
0 0 A,B,C  2 
1 1 A,B  3 
2 2  D  4 
3

选项1

s = df.set_index(['id', 'class'])['count'] 
s1 = s[s.eq(s.groupby(level=0).max())].reset_index() 
s1.groupby(
    ['id', 'count'] 
)['class'].apply(list).reset_index()[['id', 'class', 'count']] 

    id  class count 
0 0 [A, B, C] 2.0 
1 1  [A, B] 3.0 
2 2  [D] 4.0 

选项2

d1 = df.set_index(['id', 'class'])['count'].unstack() 

v = d1.values 
m = np.nanmax(v, 1) 
t = v == m[:, None] 
pd.DataFrame({ 
     'id': d1.index, 
     'class': [list(s) for s in t.dot(d1.columns)], 
     'count': m 
    })[['id', 'class', 'count']] 

    id  class count 
0 0 [A, B, C] 2.0 
1 1  [A, B] 3.0 
2 2  [D] 4.0