2016-07-22 124 views
3

使用pandas first_valid_index()来获得列的第一个非空值的索引,我该如何移动列的单个值而不是整列。即如何移动一个熊猫数据帧列的单个值

data = {'year': [2010, 2011, 2012, 2013, 2014, 2015, 2016,2017, 2018, 2019], 
     'columnA': [10, 21, 20, 10, 39, 30, 31,45, 23, 56], 
     'columnB': [None, None, None, 10, 39, 30, 31,45, 23, 56], 
     'total': [100, 200, 300, 400, 500, 600, 700,800, 900, 1000]} 

df = pd.DataFrame(data) 
df = df.set_index('year') 
print df 
     columnA columnB total 
year       
2010  10  NaN 100 
2011  21  NaN 200 
2012  20  NaN 300 
2013  10  10 400 
2014  39  39 500 
2015  30  30 600 
2016  31  31 700 
2017  45  45 800 
2018  23  23 900 
2019  56  56 1000 

for col in df.columns: 
    if col not in ['total']: 
     idx = df[col].first_valid_index() 
     df.loc[idx, col] = df.loc[idx, col] + df.loc[idx, 'total'].shift(1) 

print df  

AttributeError: 'numpy.float64' object has no attribute 'shift' 

期望的结果:

print df 
     columnA columnB total 
year       
2010  10  NaN 100 
2011  21  NaN 200 
2012  20  NaN 300 
2013  10  310 400 
2014  39  39 500 
2015  30  30 600 
2016  31  31 700 
2017  45  45 800 
2018  23  23 900 
2019  56  56 1000 

回答

1

您可以过滤所有列名,其中是至少一个NaN值,然后使用uniontotal柱:

for col in df.columns: 
    if col not in pd.Index(['total']).union(df.columns[~df.isnull().any()]): 
     idx = df[col].first_valid_index() 
     df.loc[idx, col] += df.total.shift().loc[idx] 
print (df) 
     columnA columnB total 
year       
2010  10  NaN 100 
2011  21  NaN 200 
2012  20  NaN 300 
2013  10 310.0 400 
2014  39  39.0 500 
2015  30  30.0 600 
2016  31  31.0 700 
2017  45  45.0 800 
2018  23  23.0 900 
2019  56  56.0 1000 
+0

是否总是最后一列? – jezrael

+0

或更好,如果在'Total'列是'NaN'值,是可能的吗? – jezrael

+0

是的,总数可以有NaN值 – ArchieTiger

2

是你想要的吗?

In [63]: idx = df.columnB.first_valid_index() 

In [64]: df.loc[idx, 'columnB'] += df.total.shift().loc[idx] 

In [65]: df 
Out[65]: 
     columnA columnB total 
year 
2010  10  NaN 100 
2011  21  NaN 200 
2012  20  NaN 300 
2013  10 310.0 400 
2014  39  39.0 500 
2015  30  30.0 600 
2016  31  31.0 700 
2017  45  45.0 800 
2018  23  23.0 900 
2019  56  56.0 1000 

UPDATE:从熊猫0.20.1 the .ix indexer is deprecated, in favor of the more strict .iloc and .loc indexers开始。

+0

是,但我得到'nan'为'columnA' – ArchieTiger

+0

'在df.columns山坳: if col in not ['total']: idx = df [col] .first_valid_index() print df.ix [idx,col] + df.total.shift()。ix [idx]' – ArchieTiger

+0

@ArchieTiger ,为什么你使用for循环? – Merlin