2017-08-01 71 views
0

我在问自己是否可以取消多索引数据框的一个级别,以便返回的数据帧的其余索引没有排序! 代码例如:熊猫堆栈不应该对剩余索引进行排序

arrays = [["room1", "room1", "room1", "room1", "room1", "room1", 
      "room2", "room2", "room2", "room2", "room2", "room2"], 
      ["bed1", "bed1", "bed1", "bed2", "bed2", "bed2", 
      "bed1", "bed1", "bed1", "bed2", "bed2", "bed2"], 
      ["blankets", "pillows", "all", "blankets", "pillows", "all", 
      "blankets", "pillows", "all", "blankets", "pillows", "all"]] 

tuples = list(zip(*arrays)) 

index = pd.MultiIndex.from_tuples(tuples, names=['first index', 
               'second index', 'third index']) 

series = pd.Series([1, 2, 3, 1, 1, 2, 2, 2, 4, 2, 1, 3 ], index=index) 

series 

first index second index third index 
room1  bed1   blankets  1 
          pillows  2 
          all   3 
      bed2   blankets  1 
          pillows  1 
          all   2 
room2  bed1   blankets  2 
          pillows  2 
          all   4 
      bed2   blankets  2 
          pillows  1 
          all   3 

取消堆栈第二索引:

series.unstack(1) 

second index    bed1 bed2 
first index third index    
room1  all    3  2 
      blankets  1  1 
      pillows   2  1 
room2  all    4  3 
      blankets  2  2 
      pillows   2  1 

的问题是,该第三索引的顺序已经改变,因为指数为自动和按字母顺序排序。现在,行'毛毯'和'枕头'之和的'all'行是第一行,而不是最后一行。那么如何解决这个问题呢?似乎没有一个选项可以阻止自动排序。另外,似乎没有可能使用像myDataFrame.sort_index(...,key = ['some_key'])这样的键对数据框的索引进行排序。

回答

3

一种可能的解决方案是reindexreindex_axis与参数level=1

s = series.unstack(1).reindex(['blankets','pillows','all'], level=1) 
print (s) 
second index    bed1 bed2 
first index third index    
room1  blankets  1  1 
      pillows   2  1 
      all    3  2 
room2  blankets  2  2 
      pillows   2  1 
      all    4  3 

s = series.unstack(1).reindex_axis(['blankets','pillows','all'], level=1) 
print (s) 
second index    bed1 bed2 
first index third index    
room1  blankets  1  1 
      pillows   2  1 
      all    3  2 
room2  blankets  2  2 
      pillows   2  1 
      all    4  3 

更动态的解决方案:

a = series.index.get_level_values('third index').unique() 
print (a) 
Index(['blankets', 'pillows', 'all'], dtype='object', name='third index') 

s = series.unstack(1).reindex_axis(a, level=1) 
print (s) 
second index    bed1 bed2 
first index third index    
room1  blankets  1  1 
      pillows   2  1 
      all    3  2 
room2  blankets  2  2 
      pillows   2  1 
      all    4  3