复杂的groupby操作使用Pandas捕获多对一的场景

下面是我的数据框的一个小样本，它有数百万行。它表示Send_customers向Pay_Customers汇款。复杂的groupby操作使用Pandas捕获多对一的场景

 In [14]: df1 
     Out[14]: 
      Send_Customer   Pay_Customer 
0  1000000000009548332 2000000000087113758 
1  1000000000072327616 2000000000087113758 
2  1000000000081537869 2000000000087113758 
3  1000000000007725765 2000000000078800989 
4  1000000000031950290 2000000000078800989 
5  1000000000082570417 2000000000078800989 
6  1000000000009548332 1000000000142041382 
7  1000000000072327616 1000000000142041382 
8  2000000000097181041 1000000000004033594

我需要为那些参与多对一场景的send_customers存储计数。

例如，Pay_Customers 2000000000087113758,2000000000078800989,1000000000142041382正在接收来自多个send_customers的钱。因此，对于那些Send_Customers寄钱给他们，“计数”值为1

Send_Customers 1000000000009548332和1000000000072327616分别参与2至一个许多情况下用Pay_Customers 2000000000087113758和1000000000142041382，所以有累计“计算”应是2.

在此先感谢！

来源

2016-08-05 mysterious_guy

您可以使用groupby：

print(df1.groupby('Send_Customer')['Pay_Customer'].count())

输出：

Send_Customer 
1000000000007725765 1 
1000000000009548332 2 
1000000000031950290 1 
1000000000072327616 2 
1000000000081537869 1 
1000000000082570417 1 
2000000000097181041 1

根据你的评论，如果你想只保留其中count高于1你可以做到这一点，而不是行：

df1 = df1.groupby('Send_Customer')['Pay_Customer'].count().reset_index(name="count") 
df1 = df1[df1['count'] > 1]

产量：

1 1000000000009548332  2 
3 1000000000072327616  2

来源

2016-08-05 02:03:13

嗨。我的数据帧有数百万行。以上只是一个小样本。对不起，我没有提及它早些时候。我只需要采取多对一的情况下参与的客户的数量。因此，在本示例中，由于其Pay_customer不涉及多对一的场景，因此无需为Send_customer 2000000000097181041计数。 –

@mysterious_guy请参阅我的编辑。 –

复杂的groupby操作使用Pandas捕获多对一的场景

回答

相关问题