GROUPBY，计数和平均numpy的，在python

熊猫我有一个数据帧，看起来像这样：GROUPBY，计数和平均numpy的，在python

 userId movieId rating 
0   1  31  2.5 
1   1  1029  3.0 
2   1  3671  3.0 
3   2  10  4.0 
4   2  17  5.0 
5   3  60  3.0 
6   3  110  4.0 
7   3  247  3.5 
8   4  10  4.0 
9   4  112  5.0 
10   5  3  4.0 
11   5  39  4.0 
12   5  104  4.0

我需要得到具有唯一的用户ID，数个用户，平均一个数据帧如下图所示：

 userId count mean 
0   1  3 2.83 
1   2  2  4.5 
2   3  3  3.5 
3   4  2  4.5 
4   5  3  4.0

有人可以帮忙吗？

来源

2017-04-17 Anand T

df1 = df.groupby('userId')['rating'].agg(['count','mean']).reset_index() 
print(df1) 


    userId count  mean 
0  1  3 2.833333 
1  2  2 4.500000 
2  3  3 3.500000 
3  4  2 4.500000 
4  5  3 4.000000

来源

2017-04-17 17:49:55

降movieId因为我们不使用它，GROUPBY userId，然后应用聚合方法：

import pandas as pd 

df = pd.DataFrame({'userId': [1,1,1,2,2,3,3,3,4,4,5,5,5], 
        'movieId':[31,1029,3671,10,17,60,110,247,10,112,3,39,104], 
        'rating':[2.5,3.0,3.0,4.0,5.0,3.0,4.0,3.5,4.0,5.0,4.0,4.0,4.0]}) 

df = df.drop('movieId', axis=1).groupby('userId').agg(['count','mean']) 

print(df)

主要生产：

 rating   
     count  mean 
userId     
1   3 2.833333 
2   2 4.500000 
3   3 3.500000 
4   2 4.500000 
5   3 4.000000

来源

2017-04-17 17:22:13 Kewl

'df.drop（ 'movieId'，轴线= 1）.groupby（ '用户id'）rating.agg（[ '计数'， '平均']）'。清理多索引。加一 – piRSquared

下面是一个使用NumPy的基础的方法userID列似乎被排序的事实 -

unq, tags, count = np.unique(df.userId.values, return_inverse=1, return_counts=1) 
mean_vals = np.bincount(tags, df.rating.values)/count 
df_out = pd.DataFrame(np.c_[unq, count], columns = (('userID', 'count'))) 
df_out['mean'] = mean_vals

样品运行 -

In [103]: df 
Out[103]: 
    userId movieId rating 
0  1  31  2.5 
1  1  1029  3.0 
2  1  3671  3.0 
3  2  10  4.0 
4  2  17  5.0 
5  3  60  3.0 
6  3  110  4.0 
7  3  247  3.5 
8  4  10  4.0 
9  4  112  5.0 
10  5  3  4.0 
11  5  39  4.0 
12  5  104  4.0 

In [104]: df_out 
Out[104]: 
    userID count  mean 
0  1  3 2.833333 
1  2  2 4.500000 
2  3  3 3.500000 
3  4  2 4.500000 
4  5  3 4.000000

来源

2017-04-17 18:10:25 Divakar

GROUPBY，计数和平均numpy的，在python

回答

相关问题