2017-10-17 118 views
2

我有一个数据帧看起来像这样的转换长桌宽和创建列:根据行

Customer_ID  Category Products 
    1    Veg   A 
    2    Veg   B 
    3    Fruit  A 
    3    Fruit  B 
    3    Veg   B 
    1    Fruit  A 
    3    Veg   C 
    1    Fruit  C 

我想找出为每个客户ID为每个类别,其产品被买了,并相应地为每个产品创建一个列。输出应该是这样的:

Customer_ID  Category Pro_1 Pro_2  Pro_3 
    1    Veg  A  NA   NA 
    1    Fruit  A  NA   C 
    2    Veg  NA  B   NA 
    3    Veg  NA  B   C 
    3    Fruit  A  B   NA 

回答

1

使用groupbyunstack,但如果重复行的数据是concanecate在一起:

df = df.groupby(['Customer_ID','Category','Products'])['Products'].sum().unstack() 
df.columns = ['Pro_{}'.format(x) for x in range(1, len(df.columns)+1)] 
df = df.reset_index() 
print (df) 
    Customer_ID Category Pro_1 Pro_2 Pro_3 
0   1 Fruit  A None  C 
1   1  Veg  A None None 
2   2  Veg None  B None 
3   3 Fruit  A  B None 
4   3  Veg None  B  C 

与辅助柱另一种解决方案,三倍必须是唯一的:

#if not unique triples remove duplicates 
df = df.drop_duplicates(['Customer_ID','Category','Products']) 

df['a'] = df['Products'] 
df = df.set_index(['Customer_ID','Category','Products'])['a'].unstack() 
df.columns = ['Pro_{}'.format(x) for x in range(1, len(df.columns)+1)] 
df = df.reset_index() 
print (df) 
    Customer_ID Category Pro_1 Pro_2 Pro_3 
0   1 Fruit  A None  C 
1   1  Veg  A None None 
2   2  Veg None  B None 
3   3 Fruit  A  B None 
4   3  Veg None  B  C 
+0

这里的问题是,当我们匆匆结束时,我们会得到与产品的唯一值数量一样多的产品。这导致新的列对客户ID和类别不严格。我试图根据组级别的产品创建列 – owise

+0

是否可以通过它更改输入数据? – jezrael

+0

你是什么意思通过改变输入数据? – owise

0

试试这个:(不介意的IO的事情,这只是简单的复制/粘贴)

import pandas as pd 
from io import StringIO 
df = pd.read_csv(StringIO(""" 
Customer_ID  Category Products 
    1    Veg   A 
    2    Veg   B 
    3    Fruit  A 
    3    Fruit  B 
    3    Veg   B 
    1    Fruit  A 
    3    Veg   C 
    1    Fruit  C"""), sep='\s+') 
df = df.join(pd.get_dummies(df['Products'])) 
g = df.groupby(['Customer_ID', 'Category']).sum() 
print(g) 

输出:

     A B C 
Customer_ID Category   
1   Fruit  1 0 1 
      Veg  1 0 0 
2   Veg  0 1 0 
3   Fruit  1 1 0 
      Veg  0 1 1 
1

另一个选项crosstab

pd.crosstab([df['Customer_ID'],df['Category']], df['Products']) 

输出:

Products    A B C 
Customer_ID Category   
1   Fruit  1 0 1 
      Veg  1 0 0 
2   Veg  0 1 0 
3   Fruit  1 1 0 
      Veg  0 1 1 

之后,您可以重置指数类似的解决方案,以你想要的东西。

df = df.reset_index() 
Products Customer_ID Category A B C 
0     1 Fruit 1 0 1 
1     1  Veg 1 0 0 
2     2  Veg 0 1 0 
3     3 Fruit 1 1 0 
4     3  Veg 0 1 1 
+0

的crosstabing发生在所有产品,我们如何才能将产品定位在客户ID和类别层面? – owise