2017-04-21 100 views
3

我具有由月和年为多个传感器的传感器数据:大熊猫:计算从所述差的分组的平均

import pandas as pd 
df = pd.DataFrame([ 
['A', 'Jan', 2015, 13], 
['A', 'Feb', 2015, 10], 
['A', 'Jan', 2016, 12], 
['A', 'Feb', 2016, 11], 
['B', 'Jan', 2015, 7], 
['B', 'Feb', 2015, 8], 
['B', 'Jan', 2016, 4], 
['B', 'Feb', 2016, 9] 
], columns = ['sensor', 'month', 'year', 'value']) 

In [2]: df 
Out[2]: 
    sensor month year value 
0  A Jan 2015  13 
1  A Feb 2015  10 
2  A Jan 2016  12 
3  A Feb 2016  11 
4  B Jan 2015  7 
5  B Feb 2015  8 
6  B Jan 2016  4 
7  B Feb 2016  9 

予算出的平均每个传感器和每月用GROUPBY:

month_avg = df.groupby(['sensor', 'month']).mean()['value'] 

In [3]: month_avg 
Out[3]: 
sensor month 
A  Feb  10.5 
     Jan  12.5 
B  Feb  8.5 
     Jan  5.5 

现在我想添加一列到df与月平均值的差异,如下所示:

sensor month year value diff_from_avg 
0  A Jan 2015  13 1.5 
1  A Feb 2015  10 2.5 
2  A Jan 2016  12 0.5 
3  A Feb 2016  11 0.5 
4  B Jan 2015  7 2.5 
5  B Feb 2015  8 0.5 
6  B Jan 2016  4 -1.5 
7  B Feb 2016  9 -0.5 

我想多索引dfavgs_by_month相似,并试图简单的减法,但没有好:

df = df.set_index(['sensor', 'month']) 
df['diff_from_avg'] = month_avg - df.value 

谢谢你的任何建议。

回答

4

assigntransform

diff_from_avg=df.value - df.groupby(['sensor', 'month']).value.transform('mean') 
df.assign(diff_from_avg=diff_from_avg) 

    sensor month year value diff_from_avg 
0  A Jan 2015  13   0.5 
1  A Feb 2015  10   -0.5 
2  A Jan 2016  12   -0.5 
3  A Feb 2016  11   0.5 
4  B Jan 2015  7   1.5 
5  B Feb 2015  8   -0.5 
6  B Jan 2016  4   -1.5 
7  B Feb 2016  9   0.5 
+1

当然!对我来说太快了!我应该开始使用'assign',如果只是为了更快地写出答案! –

+0

这看起来不错,但是我在第一行得到了一个无益的错误:'AttributeError:'NoneType'对象没有属性'transform''。任何想法这可能意味着什么? – robroc

+0

@ juanpa.arrivillaga我使用'assign',因为我不喜欢打破'df' ..特别是当我可能很好的时候操作。 – piRSquared

0

您需要设置数据框的索引与分组系列一致的新列,那么你可以直接减去:

df.set_index(['sensor','month'], inplace=True) df['diff'] = df['value'] - month_avg

2

尝试:

df['diff_from_avg']=df.groupby(['sensor','month'])['value'].apply(lambda x: x-x.mean()) 
Out[18]: 
    sensor month year value diff_from_avg 
0  A Jan 2015  13   0.5 
1  A Feb 2015  10   -0.5 
2  A Jan 2016  12   -0.5 
3  A Feb 2016  11   0.5 
4  B Jan 2015  7   1.5 
5  B Feb 2015  8   -0.5 
6  B Jan 2016  4   -1.5 
7  B Feb 2016  9   0.5