2017-06-29 116 views
-2

我有一个多数据帧与id多指标和条件month为指标片Python的 - 如何与熊猫

enter image description here

对于每个id(指数1),我希望能够切month(索引2),直到列amount1或amount2中的最后一个非零值。

所需的输出
enter image description here

我已经试过所有切片IDS,但我想不出我应该如何切不同夹头每个IDS:

df.loc[:,:max(df[df['amount1'] != 0].index)[1]] 
+0

能否请您分享预期上面例子中的输出?此外,您可能会尝试进行任何代码尝试。 –

+0

@Cedric更新了问题 – obabs

+0

您确定输出的是您发布的输出吗?我相信有两排失踪。请检查我的答案。 –

回答

1

有可能是一个更有效的选择。但是,下面的代码就可以实现你想要什么:

import pandas as pd 

# We create the original dataframe 
arrays = [[102,102,102,102,102,102,102,102,103,103,103,103,103,103,103,104,104,104,104,104,104,104,104,104,104], 
["11/1/2004","12/1/2004","1/1/2005","2/1/2005","3/1/2005","4/1/2005","5/1/2005","6/1/2005","4/1/2003","5/1/2003","6/1/2003","7/1/2003","8/1/2003","9/1/2003","10/1/2003","8/1/2003","9/1/2003","10/1/2003","11/1/2003","12/1/2003","1/1/2004","2/1/2004","3/1/2004","4/1/2004","5/1/2004"]] 
tuples = list(zip(*arrays)) 
index = pd.MultiIndex.from_tuples(tuples, names=['id', 'month']) 
amount1 = [0,0,-9100000,0,1444.1,0,0,0,0,0,0,-5.4e7,0,0,0,0,0,0,0,-3.3e7,-4.3e7,0,0,0,0] 
amount2 = [1105.900001,0,1037.3,0,0,0,0,0,0,0,0,0,0,0,0,117.4199962,117.315,0,0,107.77771641,105.9499986,0,106.3398808,0,0] 
df = pd.DataFrame({"amount1": amount1, "amount2": amount2},index=index) 

# We slice the dataframe by ids 
df_out_list = list() 
for i,id in enumerate(df.index.levels[0]): 
    df2 = df.xs((id,)) 
    df2_nonzeros = df2[(df2['amount1'] != 0) | (df2['amount2'] != 0)] 
    df2_result = df2[:df2_nonzeros.tail(1).index[0]] 
    N = len(df2_result.index) 
    arrays = [[id]*N, df2_result.index] 
    tuples_result = list(zip(*arrays)) 
    index_result = pd.MultiIndex.from_tuples(tuples_result, names=['id', 'month']) 
    df_out_list.append(pd.DataFrame({"amount1": list(df2_result["amount1"]),"amount2": list(df2_result["amount2"])},index=index_result)) 

# We create the output dataframe appending the dataframes by id 
for i,df_el in enumerate(df_out_list): 
    if i==0: 
     df_out = df_el 
    else: 
     df_out = df_out.append(df_el) 

print df 
print df_out 

输出这样的:

enter image description here