2014-09-23 63 views
3

我已经看到this question,但期望的结果与我的略有不同。熊猫根据集团分类

想象一个数据帧正是如此分组:

df.groupby(['product_name', 'usage_type']).total_cost.sum() 

product_name usage_type 
Lorem   A    30.694665 
       B    0.000634 
       C    1.659360 
       D    0.000031 
       E    3339.140042 
       F    0.074340 
Ipsum   G    9.627360 
       A    19.053377 
       D    14.492155 
Dolor   B    9.698245 
       H    6993.792163 
       C   31947.955679 
       D    2150.400001 
       E    26.337789 
Name: total_cost, dtype: float6 

我想输出是相同的结构,但具有两个属性:

  1. 订购的产品名称由成本的总和
  2. 按照字典顺序排列使用类型(开心替代方法:按降序成本排序)

这样成本最高的产品首先出现,但仍然保留了故障。

如果它非常简单,我可以放弃使用类型的二级排序。

回答

4

与分组数据帧开始:

import pandas as pd 
df2 = pd.read_table('data', sep='\s+').set_index(['product_name', 'usage_type']) 
#         val 
# product_name usage_type    
# Lorem  A    30.694665 
#    B    0.000634 
#    C    1.659360 
#    D    0.000031 
#    E   3339.140042 
#    F    0.074340 
# Ipsum  G    9.627360 
#    A    19.053377 
#    D    14.492155 
# Dolor  B    9.698245 
#    H   6993.792163 
#    C   31947.955679 
#    D   2150.400001 
#    E    26.337789 

您可以在关键值存储在新列:

df2['key1'] = df2.groupby(level='product_name')['val'].transform('sum') 
df2['key2'] = df2.index.get_level_values('usage_type') 

,然后排序这些键列:

# >>> df2.sort(['key1', 'key2'], ascending=[False,True]) 
#         val   key1 key2 
# product_name usage_type         
# Dolor  B    9.698245 41128.183877 B 
#    C   31947.955679 41128.183877 C 
#    D   2150.400001 41128.183877 D 
#    E    26.337789 41128.183877 E 
#    H   6993.792163 41128.183877 H 
# Lorem  A    30.694665 3371.569072 A 
#    B    0.000634 3371.569072 B 
#    C    1.659360 3371.569072 C 
#    D    0.000031 3371.569072 D 
#    E   3339.140042 3371.569072 E 
#    F    0.074340 3371.569072 F 
# Ipsum  A    19.053377  43.172892 A 
#    D    14.492155  43.172892 D 
#    G    9.627360  43.172892 G 

result = df2.sort(['key1', 'key2'], ascending=[False,True])['val'] 
print(result) 

产量

product_name usage_type 
Dolor   B     9.698245 
       C    31947.955679 
       D    2150.400001 
       E    26.337789 
       H    6993.792163 
Lorem   A    30.694665 
       B     0.000634 
       C     1.659360 
       D     0.000031 
       E    3339.140042 
       F     0.074340 
Ipsum   A    19.053377 
       D    14.492155 
       G     9.627360 
+0

非常好,谢谢! – 2014-09-23 14:13:09