pivot_table与组和没有价值的领域

我有熊猫数据帧网址像pivot_table与组和没有价值的领域

location dom_category 
3   'edu' 
3   'gov' 
3   'edu' 
4   'org' 
4   'others' 
4   'org'

，我想这个数据帧像

location edu gov org others 
3   2  1  0  0 
4   0  0  2  1

的EDU，GOV，组织和其他含有计算具体位置。我有正确的代码，但我知道它不是最优化的

url['val']=1 
url_final=url.pivot_table(index=['location'],values='val',columns= 
['dom_category'],aggfunc=np.sum)

来源

2017-06-01 jarry jafery

首先，如果必要，除去'通过str.strip。

然后用groupby与聚集size和unstack重塑：

df['dom_category'] = df['dom_category'].str.strip("\'") 
df = df.groupby(['location','dom_category']).size().unstack(fill_value=0) 
print (df) 
dom_category edu gov org others 
location       
3    2 1 0  0 
4    0 0 2  1

或者使用pivot_table：

df['dom_category'] = df['dom_category'].str.strip("\'") 
df=df.pivot_table(index='location',columns='dom_category',aggfunc='size', fill_value=0) 
print (df) 
dom_category edu gov org others 
location       
3    2 1 0  0 
4    0 0 2  1

最后可能转换索引列和删除列命名dom_category通过reset_index + rename_axis：

df = df.reset_index().rename_axis(None, axis=1) 
print (df) 
    location edu gov org others 
0   3 2 1 0  0 
1   4 0 0 2  1

来源

2017-06-01 05:03:31 jezrael

让我们用str.strip，get_dummies和groupby：

df['dom_category'] = df.dom_category.str.strip("\'") 
df.assign(**df.dom_category.str.get_dummies()).groupby('location').sum().reset_index()

输出：

location edu gov org others 
0   3 2 1 0  0 
1   4 0 0 2  1

来源

2017-06-01 04:58:37

'pd.get_dummies（df.dom_category）.groupby（df.location）的.sum（）。reset_index（）' – piRSquared

@piRSquared感谢。 –

使用groupby和value_counts

看家
摆脱'

df.dom_category = df.dom_category.str.strip("'")

休息解的

df.groupby('location').dom_category.value_counts().unstack(fill_value=0) 

dom_category edu gov org others 
location       
3    2 1 0  0 
4    0 0 2  1

为了得到格式化恰到好处

df.groupby('location').dom_category.value_counts().unstack(fill_value=0) \ 
    .reset_index().rename_axis(None, 1) 

    location edu gov org others 
0   3 2 1 0  0 
1   4 0 0 2  1

来源

2017-06-01 05:52:01 piRSquared

pivot_table与组和没有价值的领域

回答

相关问题