方法链解决方案在pandas中删除列级别DataFrame

我在pandas DataFrames中重塑和查询数据时使用了很多方法链。有时会在索引（行）和列中创建额外的和不必要的级别。如果是这样，例如上的索引（行轴）这是很容易通过使用DataFrame.reset_index()解决：方法链解决方案在pandas中删除列级别DataFrame

df.query('some query') 
    .apply(cool_func) 
    .reset_index('unwanted_index_level',drop=True) # <==== 
    .apply(another_cool_func)

的reset_index功能允许人们继续链的方法和保持与DataFrame工作。

不过，我从来没有找到column_axis的等效解决方案。有没有呢？

来源

2016-11-17 dmeu

您是否在寻找'.drop'来放置一列？ – James

嗨 - 没有我想要在'DataFrame.columns'轴'MultiIndex'中删除一个级别。 – dmeu

如何删除列索引级别时如何处理列名的重复？ – James

你可以只stack列（将其移动到指数），并调用reset_index与降=真，或者你可以使用一个reset_index()作为出发点（见frame.py＃L2940）写一个reset_columns()方法

df.query('some query') 
    .apply(cool_func) 
    .stack(level='unwanted_col_level_name') 
    .reset_index('unwanted_col_level_name',drop=True) 
    .apply(another_cool_func)

替代：猴补丁溶液

def drop_column_levels(self, level=None, inplace=False): 
     """ 
     For DataFrame with multi-level columns, drops one or more levels. 
     For a standard index, or if dropping all levels of the MultiIndex, will revert 
     back to using a classic RangeIndexer for column names. 

     Parameters 
     ---------- 
     level : int, str, tuple, or list, default None 
      Only remove the given levels from the index. Removes all levels by 
      default 
     inplace : boolean, default False 
      Modify the DataFrame in place (do not create a new object) 

     Returns 
     ------- 
     resetted : DataFrame 
     """ 
     if inplace: 
      new_obj = self 
     else: 
      new_obj = self.copy() 

     new_columns = pd.core.common._default_index(len(new_obj.columns)) 
     if isinstance(self.index, pd.MultiIndex): 
      if level is not None: 
       if not isinstance(level, (tuple, list)): 
        level = [level] 
       level = [self.index._get_level_number(lev) for lev in level] 
       if len(level) < len(self.columns.levels): 
        new_columns = self.columns.droplevel(level) 

     new_obj.columns = new_columns 
     if not inplace: 
      return new_obj 

# Monkey patch the DataFrame class 
pd.DataFrame.drop_column_levels = drop_column_levels

来源

2016-11-17 14:22:46

太棒了！我不知道'stack'函数！它也可以用于其他的东西！它很好地工作 – dmeu

一种选择到允许持续点链接是定义新的方法对于减少列索引级别的pd.DataFrame类。这被称为猴子补丁，它会降低代码的可移植性。

def reset_column_index(self, inplace=False): 
    if inplace: 
     self.columns = ['_'.join(tup) for tup in self.columns] 
    else: 
     c = self.copy() 
     c.columns = ['_'.join(tup) for tup in c.columns] 
     return c 

pd.DataFrame.reset_column_index = reset_column_index 

df.query('some query') 
    .apply(cool_func) 
    .reset_column_index() 
    .apply(another_cool_func)

使用此方法会将多索引列平铺为单个索引，并将名称与下划线合并。

#  foo   bar 
#  A  B  A  B 
# 0 17  2  0  3 
# 1  4 12  40 11

成为

# foo_A foo_B bar_A bar_B 
# 0 17  2  0  3 
# 1  4  12  40  11

来源

2016-11-17 14:49:54 James

感谢您的建议。我认为它是有效的 - 但我倾向于选择已经“打包”以便兼容的选项，而不总是必须定义相同的功能 – dmeu

我完全同意。 @朱利安的答案似乎运作良好。 – James

我刚刚找到另一种解决办法我自己，这是使用DataFrame这相当于DataFrame.transpose()的.T领域。

df.query('some query') 
    .apply(cool_func) 
    .T.reset_index('unwanted_col_level_name',drop=True).T 
    .apply(another_cool_func)

来源

2016-11-17 15:00:45 dmeu

方法链解决方案在pandas中删除列级别DataFrame

回答

相关问题