2016-01-24 192 views
1
datetime col_A col_B 
1/1/2012 125.501 A 
1/2/2012 NaN  A 
1/3/2012 125.501 A 
1/4/2013 NaN  A 
1/5/2013 125.501 B 
2/28/2013 125.501 B 
2/28/2014 125.501 B 
1/2/2016 125.501 B 
1/4/2016 125.501 B 
2/28/2016 NaN  B 

Fill in missing values in pandas dataframe using mean GROUPBY围护与所有字符串列,我填写为col_a遗漏值是这样的:对大熊猫据帧

df = df.groupby([df.index.month, df.index.day]).transform(lambda x: x.fillna(x.mean())) 

然而,当我这样做,它使col_B去远。我怎样才能保留所有字符串的col_B?

+1

在左边,你需要'DF [ '为col_a'] =',而不是仅仅'DF ='。您用一列替换整个数据帧。只需更换该列。这里没关系,但是我也会在右侧指定'col_A',而不是依赖于'mean'来忽略'col_B' – JohnE

+0

如果我有多个列,比如'col_A' ,这也可以工作:'df [['col_A','col_C']] = ...'? – user308827

回答

1

我想你可以添加col_A

df['col_A'] = df.groupby([df.index.month, df.index.day])['col_A'].transform(lambda x: 
                      x.fillna(x.mean())) 
print df 
       col_A col_B 
datetime     
2012-01-01 125.501  A 
2012-01-02 125.501  A 
2012-01-03 125.501  A 
2013-01-04 125.501  A 
2013-01-05 125.501  B 
2013-02-28 125.501  B 
2014-02-28 125.501  B 
2016-01-02 125.501  B 
2016-01-04 125.501  B 
2016-02-28 125.501  B