2017-10-09 36 views
1

我有输入,如:如何通过使用python熊猫来基于一个组的平均值?

 
NAME   Geoid Year QTR Index 
'Abilene, TX 10180 1978 3 0 
'Abilene, TX 10180 1978 4 0 
'Abilene, TX 10180 1979 1 0 
'Abilene, TX 10180 1979 2 0 
'Decatur, IL 19500 1998 1 110.51 
'Decatur, IL 19500 1998 2 110.48 
'Decatur, IL 19500 1998 3 113.01 
'Decatur, IL 19500 1998 4 114.16 
'Fairbanks, AK 21820 1990 1 63.74 
'Fairbanks, AK 21820 1990 2 70.68 
'Fairbanks, AK 21820 1990 3 83.56 
'Fairbanks, AK 21820 1990 4 83.95 

,我要转换为从MYSQL蟒蛇查询是这样的:

SELECT geoid, name, YEAR, AVG(index) 
    FROM table_1 
    WHERE geoid>0 
    GROUP BY geoid, metro_name, YEAR; 

AVG的Python的当量平均值是我在网上阅读,但当我使用的意思是它给了我一个单一的价值。

pandas get column average/mean

但我想输出分组年份和季度,如:

 
Name   Geoid YEAR AVG(index) 
'Abilene, TX 10180 1978 0 
'Abilene, TX 10180 1979 0 
'Decatur, IL 19500 1998 111.75 
'Fairbanks, AK 21820 1990 74.9875 

如何实现这一目标?

回答

3

query使用或boolean indexing第一过滤和然后groupby与骨料mean

df1 = df.query('Geoid > 0').groupby(['NAME','Geoid','Year'], as_index=False)['Index'].mean() 
print (df1) 
      NAME Geoid Year  Index 
0 'Abilene, TX 10180 1978 0.0000 
1 'Abilene, TX 10180 1979 0.0000 
2 'Decatur, IL 19500 1998 112.0400 
3 'Fairbanks, AK 21820 1990 75.4825 

df1 = df[df['Geoid'] > 0].groupby(['NAME','Geoid','Year'], as_index=False)['Index'].mean() 
print (df1) 
      NAME Geoid Year  Index 
0 'Abilene, TX 10180 1978 0.0000 
1 'Abilene, TX 10180 1979 0.0000 
2 'Decatur, IL 19500 1998 112.0400 
3 'Fairbanks, AK 21820 1990 75.4825