问题是不一样的,请参阅8381,但在我的熊猫版本0.18.1
它很好。
我想你可以改变False
到True
然后reset_index
:size
和count
之间
df_Company=df1.groupby(by=['manufacturer','quality_issue'], as_index=True)['quality_issue2']
.count()
.reset_index()
差异(见differences with numeric values):
样品与string
值:
import pandas as pd
import numpy as np
df1=pd.DataFrame([['foo','foo','bar','bar','bar','oats'],
['foo','foo','bar','bar','bar','oats'],
[None,'foo','bar',None,'bar','oats']]).T
df1.columns=['manufacturer','quality_issue','quality_issue2']
print (df1)
manufacturer quality_issue quality_issue2
0 foo foo None
1 foo foo foo
2 bar bar bar
3 bar bar None
4 bar bar bar
5 oats oats oats
df_Company=df1.groupby(by=['manufacturer','quality_issue'], as_index=False)['quality_issue2']
.count()
print (df_Company)
manufacturer quality_issue quality_issue2
0 bar bar 2
1 foo foo 1
2 oats oats 1
df_Company1=df1.groupby(by=['manufacturer','quality_issue'])['quality_issue2']
.size()
.reset_index(name='quality_issue2')
print (df_Company1)
manufacturer quality_issue quality_issue2
0 bar bar 3
1 foo foo 2
2 oats oats 1
我认为你可以省略[quality_issue2]
,输出是一样的:
df_Company1=df1.groupby(by=['manufacturer','quality_issue'])
.size()
.reset_index(name='quality_issue2')
print (df_Company1)
manufacturer quality_issue quality_issue2
0 bar bar 3
1 foo foo 2
2 oats oats 1
顺便说一句,你需要'count'?不是“大小”? – jezrael
[differences](http://stackoverflow.com/a/33346694/2901002):['size'](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby。 GroupBy.size.html#pandas.core.groupby.GroupBy.size)包含'NaN'值,['count'](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core。 groupby.GroupBy.count.html#pandas.core.groupby.GroupBy.count)不是 – jezrael
我试图做的是由制造商分组,并查看制造商有哪些问题。然后计算每个制造商对这些quality_issues有多少问题。因此,我认为最好是用数字代替大小(对吧?)。基本上,quality_issue和quality_issue2列的数据完全相同。 – Morganis