的Python：第三列的和值，如果两列具有相同的值

我有以下的数据帧df的Python：第三列的和值，如果两列具有相同的值

df 
    a b i 
0 1.0 3.0 2.0 
1 1.0 3.0 3.0 
2 1.0 3.0 1.0 
3 1.0 3.0 3.0 
4 1.0 3.0 7.0 
5 1.0 3.0 8.0 
6 1.0 4.0 4.0 
7 1.0 4.0 0.0 
8 1.0 3.0 2.0 
9 1.0 3.0 1.0 
10 1.0 3.0 2.0

我要让总和超过i为同一对夫妇a和b，所以

df2 
    a b i 
0 1.0 3.0 31.0 
1 1.0 4.0 4.0 
2 1.0 3.0 0.0 

df2 = df2.groupby(['a', 'b']).sum(['i']).reset_index()

来源

2016-11-29 emax

我想你需要添加列i到groupby末，那么它是使用了sum功能：

df2 = df2.groupby(['a', 'b'])['i'].sum().reset_index() 
print (df2) 
    a b  i 
0 1.0 3.0 29.0 
1 1.0 4.0 4.0

或者添加参数as_index=False退货df：

df2 = df2.groupby(['a', 'b'], as_index=False)['i'].sum() 
print (df2) 
    a b  i 
0 1.0 3.0 29.0 
1 1.0 4.0 4.0

如果需要另一种解决方案是使用Series：

df2 = df2.i.groupby([df2.a,df2.b]).sum().reset_index() 
print (df2) 
    a b  i 
0 1.0 3.0 29.0 
1 1.0 4.0 4.0

编辑：

如果按位置分组的需求差异df使用groupbySeriesg与aggregate：

ab = df2[['a','b']] 

#compare shifted values  
print (ab.ne(ab.shift())) 
     a  b 
0 True True 
1 False False 
2 False False 
3 False False 
4 False False 
5 False False 
6 False True 
7 False False 
8 False True 
9 False False 
10 False False

#check at least one True 
print (ab.ne(ab.shift()).any(1)) 
0  True 
1  False 
2  False 
3  False 
4  False 
5  False 
6  True 
7  False 
8  True 
9  False 
10 False 
dtype: bool

#use cumulative sum of boolean Series 
g = ab.ne(ab.shift()).any(1).cumsum() 
print (g) 
0  1 
1  1 
2  1 
3  1 
4  1 
5  1 
6  2 
7  2 
8  3 
9  3 
10 3 
dtype: int32

print (df2.groupby(g).agg(dict(a='first', b='first', i='sum'))) 
    a b  i 
1 1.0 3.0 24.0 
2 1.0 4.0 4.0 
3 1.0 3.0 5.0

来源

2016-11-29 22:02:48 jezrael

要比较，看是否事先a, b组合发生了变化，并做了cumsum建立一组阵列

ab = df[['a', 'b']].apply(tuple, 1) 

df.groupby(ab.ne(ab.shift()).cumsum()) \ 
    .agg(dict(a='last', b='last', i='sum')) \ 
    .reindex_axis(df.columns.tolist(), 1)

进行分解

ab = df[['a', 'b']].apply(tuple, 1)
- 我弄了一系列的元组的，所以我可以看到，如果组合改变
ab.ne(ab.shift())
- 检查，如果元组不一样，以前的元组
ab.ne(ab.shift()).cumsum()
- 如果不是，那么True值添加到cumumlative总和。这将创建一个方便的分组每个contigous组相同的双a和b
.agg(dict(a='last', b='last', i='sum'))
- 只是规定如何处理各组每列做。得到a和b的最后一个值，这是很好的，因为我知道它在整个组中都是一样的。求和列i
.reindex_axis(df.columns.tolist(), 1)
- 让我列的顺序是

方式

来源

2016-11-29 22:08:56 piRSquared

的Python：第三列的和值，如果两列具有相同的值

回答

相关问题