2016-12-15 95 views
1

我有一个多指标的数据帧,看起来像这样:如何将列添加到多索引数据框?

 ACA FP Equity   UCG IM Equity    
      LAST PRICE  VOLUME LAST PRICE  VOLUME 
date               
2010-01-04  12.825 5879617.0  15.0292 10844639.0 
2010-01-05  13.020 6928587.0  14.8092 16456228.0 
2010-01-06  13.250 5290631.0  14.6834 10446450.0 
2010-01-07  13.255 5328586.0  15.0292 31900341.0 
2010-01-08  13.470 7160295.0  15.1707 40750768.0 

如果我想在数据帧中添加第三列的每个资产,什么是语法?例如:

df['ACA FP Equity']['PriceVolume'] = df['ACA FP Equity']['LAST PRICE']*3 

但我想这样做的每个股权,没有手动添加每一个。

在此先感谢。

回答

1

如果你需要的所有列乘以3的使用slicers选择它们,然后重命名列名:

idx = pd.IndexSlice 
df1 = df.loc[:, idx[:, 'LAST PRICE']].rename(columns={'LAST PRICE':'PriceVolume'}) * 3 
print (df1) 
      ACA FP Equity UCG IM Equity 
      PriceVolume PriceVolume 
2010-01-04  38.475  45.0876 
2010-01-05  39.060  44.4276 
2010-01-06  39.750  44.0502 
2010-01-07  39.765  45.0876 
2010-01-08  40.410  45.5121 

,那么你需要concat输出:

print (pd.concat([df,df1], axis=1)) 
      ACA FP Equity   UCG IM Equity    ACA FP Equity \ 
       LAST PRICE  VOLUME LAST PRICE  VOLUME PriceVolume 
2010-01-04  12.825 5879617.0  15.0292 10844639.0  38.475 
2010-01-05  13.020 6928587.0  14.8092 16456228.0  39.060 
2010-01-06  13.250 5290631.0  14.6834 10446450.0  39.750 
2010-01-07  13.255 5328586.0  15.0292 31900341.0  39.765 
2010-01-08  13.470 7160295.0  15.1707 40750768.0  40.410 

      UCG IM Equity 
      PriceVolume 
2010-01-04  45.0876 
2010-01-05  44.4276 
2010-01-06  44.0502 
2010-01-07  45.0876 
2010-01-08  45.5121 

另一种解决方案,而concat是从列selected_df创建元组,然后分配输出:

idx = pd.IndexSlice 
selected_df = df.loc[:, idx[:, 'LAST PRICE']] 

new_cols = [(x, 'PriceVolume') for x in selected_df.columns.levels[0]] 
print (new_cols) 
[('ACA FP Equity', 'PriceVolume'), ('UCG IM Equity', 'PriceVolume')] 

df[new_cols] = selected_df * 3 
print(df) 
      ACA FP Equity   UCG IM Equity    ACA FP Equity \ 
       LAST PRICE  VOLUME LAST PRICE  VOLUME PriceVolume 
2010-01-04  12.825 5879617.0  15.0292 10844639.0  38.475 
2010-01-05  13.020 6928587.0  14.8092 16456228.0  39.060 
2010-01-06  13.250 5290631.0  14.6834 10446450.0  39.750 
2010-01-07  13.255 5328586.0  15.0292 31900341.0  39.765 
2010-01-08  13.470 7160295.0  15.1707 40750768.0  40.410 

      UCG IM Equity 
      PriceVolume 
2010-01-04  45.0876 
2010-01-05  44.4276 
2010-01-06  44.0502 
2010-01-07  45.0876 
2010-01-08  45.5121 
1

我能想到的最优雅的方式是:

df['ACA FP Equity']['PriceVolume'] = pd.Series(df['ACA FP Equity']['LAST PRICE'].apply(lambda x: x*3)) 

apply语句让你执行给定功能,在这种情况下,lambda expression相乘由三个每个输入,对于指定的每个值数据框中的列。运行apply语句将返回一个pandas Series,然后可以将其添加为数据框中的一列。

这里是展示它是如何用一个简单的数据帧运行的一个简单例子:

import pandas as pd 

df = pd.DataFrame(data={'a': [1, 2, 3], 'b': [4, 5, 6]}) 
print(df) 

# Output: 
#/a b 
# 0 1 4 
# 1 2 5 
# 2 3 6 


# Add column 'c' 
df['c'] = pd.Series(df['b'].apply(lambda x: x*3)) 
print(df) 

# Output: 
#/a b c 
# 0 1 4 12 
# 1 2 5 15 
# 2 3 6 18 
+0

感谢您的编辑答案。 – jezrael

相关问题