Python的熊猫枢轴

我有以下的数据帧之后pivot_table失踪柱。数据框是通过读取一个csv文件构造的。它是一个大型的数据集，但为了这个问题，我使用了数据集中的15行作为示例。Python的熊猫枢轴

user_id contrib_count total_min_length  group_space  expert_level 
0  23720  108   1112696    0    l-2 
1  23720   13   442059    1    l-2 
2  23720   12    32180    2    l-2 
3  23720   2    20177    3    l-2 
4  23720   1    1608    10    l-2 
5 1265184   71   260186    0    l-G 
6 1265184   10    3466    2    l-G 
7 1265184   1    12081    4    l-G 
8 513380  112   1049311    0    l-4 
9 513380   1    97    1    l-4 
10 513380  113   361980    2    l-4 
11 513380   19   1198323    3    l-4 
12 513380   2    88301    4    l-4 
13 20251  705   17372707    0    l-G 
14 20251  103   2327178    1    l-G

预期结果 支点我想要什么之后，下面的数据帧：

group_space  0  1  2  3  4  5 6 7 8 9 10 expert_level 
user_id 
20251    705 103 68 24 18  2 6 NaN NaN 5 22  l-G                 
23720    108  13 12  2 NaN NaN NaN NaN NaN NaN 1  l-2

原因，我这样做是一旦我做到这一点我可以用这个预测的任务，其中expert_level为标签数据。

到目前为止，我已经做了以下以建立上述矩阵，但我无法得到expert_level列枢轴之后，如图所示。

这是我做了什么：

class GroupAnalysis(): 

    def __init__(self): 
     self.df = None 
     self.filelocation = '~/somelocation/x.csv' 

    def pivot_dataframe(self): 

     raw_df = pd.read_csv(self.filelocation) 
     self.df = raw_df[(raw_df['group_space'] < 11)] 
     self.df.set_index(['user_id', 'group_space'], inplace=True) 
     self.df = self.df['contrib_count'].unstack()

通过这样做，我得到：

group_space  0  1  2  3  4  5 6 7 8 9 10 
user_id 
20251    705 103 68 24 18  2 6 NaN NaN 5 22                 
23720    108  13 12  2 NaN NaN NaN NaN NaN NaN 1

正如你看到的我是失踪在年底expert_level列。所以问题是我如何才能在expert_level的数据框上获得我在“预期结果”中显示的数据？

来源

2014-10-17 Null-Hypothesis

当你被开拆的，你只拆垛一系列contrib_count - expert_level和total_min_length已经消失在这一点上。

而不是设置指标和开拆的，你可以使用.pivot（）

pivoted = df.pivot('user_id', 'group_space', 'contrib_count')

然后，创建USER_ID作为索引和expert_level的柱子框架，摆脱重复的：

lookup = df.drop_duplicates('user_id')[['user_id', 'expert_level']] 
lookup.set_index(['user_id'], inplace=True)

然后加入你的支点和查找

result = pivoted.join(lookup)

编辑：如果您还希望包括“total_min_length”，你可以做的第二枢轴：

pivoted2 = df.pivot('user_id', 'group_space', 'total_min_length')

，并加入三个而不是两个都：

result = pivoted.join(lookup).join(pivoted2, lsuffix="_contrib_count", rsuffix="_total_min_length")

注意lsuffix和rsuffix需要消除歧义列，因为两个枢轴都有您的示例数据中的0,1,2,3,4和10列。

来源

2014-10-17 22:30:59

我想这样的作品，是有没有去包括'total_min_length'？所以我可以使用该功能。 – 2014-10-18 02:53:43

@零假设检查编辑 – 2014-10-18 03:52:28

感谢伟大的答案。小问题，我得到的值错误：'ValueError异常：列重叠，但没有指定后缀：指数（[u'expert_level“]，D类=”对象'）' – 2014-10-18 21:57:12

Python的熊猫枢轴

回答

相关问题