2017-07-27 85 views
2

当试图在数据帧计数用类似 '那种' 行:熊猫,按计数分组,并添加到原始数据框的计数?

import pandas as pd 

items = [('aaa','aaa text 1'), ('aaa','aaa text 2'), ('aaa','aaa text 3'), 
     ('bb', 'bb text 1'), ('bb', 'bb text 2'), ('bb', 'bb text 3'), 
     ('bb', 'bb text 4'), 
     ('cccc','cccc text 1'), ('cccc','cccc text 2'), 
     ('dd', 'dd text 1'), 
     ('e', 'e text 1'), 
     ('fff', 'fff text 1'), 
     ] 

df = pd.DataFrame(items, columns=['kind', 'msg']) 
df 

    kind msg 
0 aaa  aaa text 1 
1 aaa  aaa text 2 
2 aaa  aaa text 3 
3 bb  bb text 1 
4 bb  bb text 2 
5 bb  bb text 3 
6 bb  bb text 4 
7 cccc cccc text 1 
8 cccc cccc text 2 
9 dd  dd text 1 
10 e  e text 1 
11 fff  fff text 1 

此代码:

df = df[['kind']].groupby(['kind'])['kind'] \ 
         .count() \ 
         .reset_index(name='count') \ 
         .sort_values(['count'], ascending=False) \ 
         .head(5) 

df 

,导致:

kind  count 
    0 aaa 1 
    1 bb 1 
    2 cccc 1 
    3 dd 1 
    4 e  1 

然而,一个人如何可以得到包含所有列的数据框与原始列一样,加上“计数”列?所以结果应该按这个顺序列'kind','msg','count'?

此外,如何按count计数降序排列此结果数据框?

回答

4

IIUC

In [247]: df['count'] = df.groupby('kind').transform('count') 

In [248]: df 
Out[248]: 
    kind   msg count 
0 aaa aaa text 1  3 
1 aaa aaa text 2  3 
2 aaa aaa text 3  3 
3  bb bb text 1  4 
4  bb bb text 2  4 
5  bb bb text 3  4 
6  bb bb text 4  4 
7 cccc cccc text 1  2 
8 cccc cccc text 2  2 
9  dd dd text 1  1 
10  e  e text 1  1 
11 fff fff text 1  1 

排序:

In [249]: df.sort_values('count', ascending=False) 
Out[249]: 
    kind   msg count 
3  bb bb text 1  4 
4  bb bb text 2  4 
5  bb bb text 3  4 
6  bb bb text 4  4 
0 aaa aaa text 1  3 
1 aaa aaa text 2  3 
2 aaa aaa text 3  3 
7 cccc cccc text 1  2 
8 cccc cccc text 2  2 
9  dd dd text 1  1 
10  e  e text 1  1 
11 fff fff text 1  1