更新熊猫数据框中，如果存在

数据更新的值我有一个CSV文件是这样的：更新熊猫数据框中，如果存在

word, tag, counter 
I, Subject, 1 
Love, Verb, 3 
Love, Adjective, 1

我想创建一个数据帧哪一列字和标签列表，像下面：

Word Subject Verb Adjective 
I  1  0  0 
Love 0  3  1

我如何设法用熊猫做到这一点？

来源

2017-02-23 Saber Alex

您可以使用pivot：

df = df.pivot(index='word', columns='tag', values='counter').fillna(0).astype(int) 
print (df) 
tag Adjective Subject Verb 
word       
I    0  1  0 
Love   1  0  3

另一种解决方案与set_index和unstack：

df = df.set_index(['word','tag'])['counter'].unstack(fill_value=0) 
print (df) 
tag Adjective Subject Verb 
word       
I    0  1  0 
Love   1  0  3

但如果得到：

ValueError: Index contains duplicate entries, cannot reshape

然后通过一些aggfunc在pivot_table需要汇总：

print (df) 
    word  tag counter 
0  I Subject  1 
1 Love  Verb  3 
2 Love Adjective  1 <-duplicates for Love and Adjective 
3 Love Adjective  3 <-duplicates for Love and Adjective 

df = df.pivot_table(index='word', 
        columns='tag', 
        values='counter', 
        aggfunc='mean', 
        fill_value=0) 
print (df) 
tag Adjective Subject Verb 
word       
I    0  1  0 
Love   2  0  3

与groupby和unstack另一种解决方案：

df = df.groupby(['word','tag'])['counter'].mean().unstack(fill_value=0) 
print (df) 
tag Adjective Subject Verb 
word       
I    0  1  0 
Love   2  0  3

来源

2017-02-23 14:01:45 jezrael

更新熊猫数据框中，如果存在

回答

相关问题