2017-06-13 65 views
4

我有一个包含多对多关系的两列表。如何使用Python熊猫将某些列扩展为行

例如

Animal  Food 
rabbit  grass 
rabbit  carrots 
rabbit  cabbage 
dog  carrots 
horse  grass 
horse  hay 

我想是这样的:

Animal  Food1 Food2  Food3 
rabbit  grass carrots cabbage 
dog  carrots 
horse  grass hay 

或者是这样的:

Animal  Grass  Carrots Cabbage  Hay 
rabbit  True  True  True  False 
dog   False  True  False  False 
horse  True  False  False  True 

我试过透视表,融化和堆栈,但仍无法弄清楚如何做这个。任何帮助将不胜感激。谢谢!!

+0

WEEE ...有很多方法。我爱熊猫! –

回答

4

选项1
groupbysize

df.groupby(['Animal', 'Food']).size().unstack(fill_value=0).astype(bool) 

Food cabbage carrots grass hay 
Animal         
dog  False  True False False 
horse  False False True True 
rabbit  True  True True False 

s = df.groupby('Animal').Food.apply(list) 
pd.DataFrame(s.values.tolist(), s.index).add_prefix('Food').fillna('') 

      Food0 Food1 Food2 
Animal       
dog  carrots     
horse  grass  hay   
rabbit grass carrots cabbage 

选项2
groupbyvalue_counts

df.groupby('Animal').Food.value_counts().unstack(fill_value=0).astype(bool) 

Food cabbage carrots grass hay 
Animal         
dog  False  True False False 
horse  False False True True 
rabbit  True  True True False 

选项3
groupbystr.get_dummies

df.groupby('Animal').Food.apply('|'.join).str.get_dummies().astype(bool) 

     cabbage carrots grass hay 
Animal         
dog  False  True False False 
horse  False False True True 
rabbit  True  True True False 

选择4
pandas.factorizenumpy.bincount

f1, u1 = pd.factorize(df.Animal.values) 
f2, u2 = pd.factorize(df.Food.values) 

n = u1.size 
m = u2.size 

b = np.bincount(f1 * m + f2, minlength=n * m).reshape(n, m) 

pd.DataFrame(b.astype(bool), u1, u2) 

     grass carrots cabbage hay 
rabbit True  True  True False 
dog  False  True False False 
horse True False False True 

选项5
很烦......所以有更多

f, u = pd.factorize(df.Animal.values) 
n = u.size 

a = [[] for _ in range(n)] 
[a[i].append(food) for i, food in zip(f, df.Food)]; 
pd.DataFrame(a, u).rename(columns=lambda x: x+1).add_prefix('Food').fillna('') 

      Food1 Food2 Food3 
rabbit grass carrots cabbage 
dog  carrots     
horse  grass  hay   
4

想出您可以使用:

df = pd.pivot(index=df['Animal'], 
       columns=df.groupby('Animal').cumcount().add(1).astype(str), 
       values=df['Food']) 
     .add_prefix('Food').fillna('') 
print (df) 
      Food1 Food2 Food3 
Animal       
dog  carrots     
horse  grass  hay   
rabbit grass carrots cabbage 

或者:

df = pd.crosstab(df['Animal'], df['Food']).astype(bool) 
print (df) 
Food cabbage carrots grass hay 
Animal         
dog  False  True False False 
horse  False False True True 
rabbit  True  True True False 
3

您可以使用交叉

pd.crosstab(df.Animal, df.Food).astype(bool) 


Food cabbage carrots grass hay 
Animal    
dog  False True False False 
horse False False True True 
rabbit True True True False 
4
df_out = df.set_index(['Animal','Food']).assign(Value=True).unstack().fillna(False) 
df_out.columns = df_out.columns.droplevel() 

输出:

Food cabbage carrots grass hay 
Animal         
dog  False  True False False 
horse  False False True True 
rabbit  True  True True False