2017-03-16 200 views
2

我有一个表像这样:蟒蛇大熊猫:透视表

Name | ID | Contact_method | Contact 
sarah 1 house   h1 
sarah 1 mobile   m1 
sarah 1 email   [email protected] 
bob  2 house   h2 
bob  2 mobile   m2 
bob  2 email   [email protected] 
jones 3 house   h3 
jones 3 mobile   m3 
jones 3 email   [email protected] 
jones 4 house   h4 
jones 4 mobile   m4 
jones 4 email   [email protected] 

而且我希望它像这样:

Name | ID | house | mobile | email 
sarah 1 h1  m1  [email protected] 
bob  2 h2  m2  [email protected] 
jones 3 h3  m3  [email protected] 
jones 4 h4  m4  [email protected] 

我已经可以做到这一点,但只有通过一种非常昂贵的pd.concat操作遍历所有唯一的ID。有没有简单的方法来做到这一点?我也修改了pivot()transpose()。请注意,重复的名称在那里,以便我不能依靠列值的唯一性来执行join

回答

2

与所有列设置索引除了'Contact_method',然后unstack

df.set_index(
    ['Name', 'ID', 'Contact_method'] 
)['Contact'].unstack().rename_axis(None, 1).reset_index() 

    Name ID  email house mobile 
0 bob 2  [email protected] h2  m2 
1 jones 3 [email protected] h3  m3 
2 jones 4 [email protected] h4  m4 
3 sarah 1 [email protected] h1  m1 
+0

我有一张桌子坐在我的新的临时地方,而我们继续看为房子。我现在开始远程工作。我会每月两次往返西雅图。很快,我必须回去摆脱旧的地方所有的东西。仍然很忙,但我很享受有时间回答问题。我希望你做得好! @jezrael – piRSquared

+0

@jezrael是的,我做了一个传奇的大推,然后我觉得我可以冷静一点。你几乎是有史以来最好的熊猫。我的下一个SO目标是通过DSM和Jeff的名单。我从来没有被代表自己激励过。我已经给了很多东西。我最终会得到100k ..我确实需要一件T恤。如果他们给你任何东西,你必须告诉我。 – piRSquared

+0

我想参与http://stats.stackexchange.com/和http://quant.stackexchange.com/。不过,我宁愿选择一些其他标签来获取黄金。就像我想要我的numpy徽章一样,我忽略了机器学习的东西。我想在tensorflow中获得一个金徽章(虽然我还有很多要学习) – piRSquared

0

一种方法是基于ID'手动'来建立(词典)联系词典。不知道它是否更有效。

people = dict() 
for index, row in pd.iterrows(): 
    ID = row['ID'] 
    if ID not in people: 
     people[ID] = {'ID': ID, 'Name': row['Name']} 
    people[ID][row['Contact_method']] = row['Contact'] 

print pandas.DataFrame(people).transpose() 

和输出是:

ID Name  email house mobile 
1 1 sarah [email protected] h1  m1 
2 2 bob  [email protected] h2  m2 
3 3 jones [email protected] h3  m3 
4 4 jones [email protected] h4  m4 
0

或者你可以使用透视:

df1.set_index(['ID','Name']).pivot(columns='Contact_method').reset_index() 
0

我认为piRSquared's solution是非常好的,但如果得到:

ValueError: Index contains duplicate entries, cannot reshape

print (df) 
    Name ID Contact_method  Contact 
0 sarah 1   house   h1 
1 sarah 1   mobile   m1 
2 sarah 1   email [email protected] 
3  bob 2   house   h2 
4  bob 2   mobile   m2 
5  bob 2   email  [email protected] 
6 jones 3   house   h3 
7 jones 3   mobile   m3 
8 jones 3   email [email protected] <-for same Name,ID and Contact_method get duplicate 
9 jones 3   email  [email protected] <-for same Name,ID and Contact_method get duplicate 
10 jones 4   house   h4 
11 jones 4   mobile   m4 
12 jones 4   email [email protected] 

使用pivot_tablegroubpy与聚集join

cols = ['Name','ID','house','mobile','email'] 
df1 = df.pivot_table(index=['ID','Name'], 
        columns='Contact_method', 
        values='Contact', 
        aggfunc=','.join) 
     .rename_axis(None, 1) 
     .reset_index() 
     .reindex_axis(cols, axis=1) 
print (df1) 
    Name ID house mobile    email 
0 sarah 1 h1  m1   [email protected] 
1 bob 2 h2  m2    [email protected] 
2 jones 3 h3  m3 [email protected],[email protected] <- join duplicates 
3 jones 4 h4  m4   [email protected] 

df1 = df.groupby(['Name', 'ID', 'Contact_method'])['Contact'] 
     .apply(','.join) 
     .unstack() 
     .rename_axis(None, 1) 
     .reset_index() 
     .reindex_axis(cols, axis=1) 
print (df1) 
    Name ID house mobile    email 
0 sarah 1 h1  m1   [email protected] 
1 bob 2 h2  m2    [email protected] 
2 jones 3 h3  m3 [email protected],[email protected] <- join duplicates 
3 jones 4 h4  m4   [email protected]