2017-08-10 43 views
2

我有一些花的列从第1周到52 我期待分别总结第26和最后26。Pandas - Sum系列从1-N列

我有以下几点:

column_names = [x for x in df.columns.values.tolist() 
       if x.startswith("spend_") 
       ] 

这给了我所有我感兴趣的列

[ 'spend_1', 'spend_2', 'spend_3', “spend_4 ”, 'spend_5' ...]

我可以再总结起来如下:

df['pre_spend'] = df[column_names].sum(axis=1) 

这给了我52周的时间。

有没有简单的方法来选择1_26和27_52并分别求和?

在sas中,我会这样做: pre_spend = sum(of spend_1-spend_26);

+0

你能制作一个样本数据集吗? – Travis

+0

您可以使用[列的切片](http://pandas.pydata.org/pandas-docs/stable/dsintro.html#indexing-selection)对其中的部分进行求和。你应该花时间观看这个[熊猫从头开始](http://pandas.pydata.org/talks.html#pycon-us-2015)谈话。 – wwii

回答

2

我觉得你需要的标签为DataFrame.loc选择列:

a = df.loc[:, 'spend_1':'spend_26'].sum(axis=1) 

b = df.loc[:, 'spend_27':'spend_52'].sum(axis=1) 

样品:

np.random.seed(100) 
df = pd.DataFrame(np.random.randint(10, size=(5,6))).add_prefix('spend_') 
print (df) 
    spend_0 spend_1 spend_2 spend_3 spend_4 spend_5 
0  8  8  3  7  7  0 
1  4  2  5  2  2  2 
2  1  0  8  4  0  9 
3  6  2  4  1  5  3 
4  4  4  3  7  1  1 

print (df.loc[:, 'spend_0':'spend_2']) 
    spend_0 spend_1 spend_2 
0  8  8  3 
1  4  2  5 
2  1  0  8 
3  6  2  4 
4  4  4  3 

a = df.loc[:, 'spend_0':'spend_2'].sum(axis=1) 
print (a) 
0 19 
1 11 
2  9 
3 12 
4 11 
dtype: int64 

print (df.loc[:, 'spend_3':'spend_5']) 
    spend_3 spend_4 spend_5 
0  7  7  0 
1  2  2  2 
2  4  0  9 
3  1  5  3 
4  7  1  1 

b = df.loc[:, 'spend_3':'spend_5'].sum(axis=1) 
print (b) 
0 14 
1  6 
2 13 
3  9 
4  9 
dtype: int64 
0

感谢Jezrael作品比这里我得更好:

column_names = [x for x in df.columns.values.tolist() 
       if x.startswith("spend_") 
       ] 

pre = df.loc[:,column_names[:26]] 
pre = pre.sum(axis=1) 
post = df.loc[:,column_names[26:]] 
post = post.sum(axis=1)