过滤器则排名多级索引数据帧

我有两列（col1和col2的）和多级索引（日期和符号），如下一个大熊猫数据帧：过滤器则排名多级索引数据帧

    Col1 Col2 
Date  Symbol  
2015-12-01 AAA 0.45 0.53 
      BBB -1.02 -0.57 
      CCC -0.41 0.30 
2015-11-02 AAA 0.59 -0.42 
      BBB -2.16 -0.77 
      CCC -1.02 1.09 
2015-10-01 AAA -0.44 -0.88 
      BBB 0.52 0.27 
      CCC -1.76 0.63

代码复制此据帧是：

df = pd.DataFrame({'Date': ['2015-12-01']*3 + ['2015-11-02']*3 + ['2015-10-01']*3, 
        'Symbol': ['AAA','BBB','CCC']*3, 
        'Col1': 0.45,-1.02,-0.41,0.59,-2.16,-1.02,-0.44,0.52,-1.76], 
        'Col2': [0.53,-0.57,0.3,-0.42,-0.77,1.09,-0.88,0.27,0.63]}, 
        ).set_index(['Date', 'Symbol'])

在每一个日期，我想选择前n行（在这种情况下2）的基础上在Col1中最大的值，然后排在col2的基础上，值的那些行（最大== 1，第二大== 2等）。添加结果，该原始数据帧列，最终的数据帧应该如下所示：

    Col1 Col2 Rank 
Date  Symbol   
2015-12-01 AAA 0.45 0.53 1 
      CCC -0.41 0.30 2 
      BBB -1.02 -0.57 NaN 
2015-11-02 CCC -1.02 1.09 1 
      AAA 0.59 -0.42 2 
      BBB -2.16 -0.77 NaN 
2015-10-01 BBB 0.52 0.27 1 
      AAA -0.44 -0.88 2 
      CCC -1.76 0.63 NaN

我使用GROUPBY和秩函数尝试，但我没法把索引正确。

例如，df.reset_index().groupby(['Date'])['Col1'].nlargest(2)产量：

Date   
2015-10-01 7 0.52 
      6 -0.44 
2015-11-02 3 0.59 
      5 -1.02 
2015-12-01 0 0.45 
      2 -0.41

但我无法弄清楚如何排名，并把结果返回到数据帧。

来源

2016-01-13 CurryPy

你可以做到以下几点：

df['largest'] = df.groupby(level='Date').apply(lambda x: x.Col1.nlargest(2)).reset_index(0, drop=True) 
df['ranked'] = df.groupby(level='Date').apply(lambda x: x.dropna(subset=['largest']).Col2.rank(ascending=False)).reset_index(0, drop=True)

获得：

    Col1 Col2 largest ranked 
Date  Symbol        
2015-12-01 AAA  0.45 0.53  0.45  1 
      BBB -1.02 -0.57  NaN  NaN 
      CCC -0.41 0.30 -0.41  2 
2015-11-02 AAA  0.59 -0.42  0.59  2 
      BBB -2.16 -0.77  NaN  NaN 
      CCC -1.02 1.09 -1.02  1 
2015-10-01 AAA -0.44 -0.88 -0.44  2 
      BBB  0.52 0.27  0.52  1 
      CCC -1.76 0.63  NaN  NaN

来源

2016-01-13 16:02:52 Stefan

感谢您的优雅的解决方案，这正是我试图完成。 – CurryPy

过滤器则排名多级索引数据帧

回答

相关问题