您可以使用astype
与参数category
:
cols = ['age','income','student']
for col in cols:
df[col] = df[col].astype('category')
print (df.dtypes)
age category
income category
student category
credit_rating object
Class_buys_computer object
dtype: object
如果需要转换的所有列:
for col in df.columns:
df[col] = df[col].astype('category')
print (df.dtypes)
age category
income category
student category
credit_rating category
Class_buys_computer category
dtype: object
你需要循环,因为如果使用:
df = df.astype('category')
NotImplementedError: > 1 ndim Categorical are not supported at this time
Pandas documentation about categorical。
编辑的评论:
如果需要订购catagorical,使用带有pandas.Categorical
另一种解决方案:
df['age']=pd.Categorical(df['age'],categories=["youth","middle_aged","senior"],ordered=True)
print (df.age)
0 youth
1 youth
2 middle_aged
3 senior
4 senior
5 senior
6 middle_aged
7 youth
8 youth
9 senior
10 youth
11 middle_aged
12 middle_aged
13 senior
Name: age, dtype: category
Categories (3, object): [youth < middle_aged < senior]
然后你就可以age
列进行排序数据框:
df = df.sort_values('age')
print (df)
age income student credit_rating Class_buys_computer
0 youth high no fair no
1 youth high no excellent no
7 youth medium no fair no
8 youth low yes fair yes
10 youth medium yes excellent yes
2 middle_aged high no fair yes
6 middle_aged low yes excellent yes
11 middle_aged medium no excellent yes
12 middle_aged high yes fair yes
3 senior medium no fair yes
4 senior low yes fair yes
5 senior low yes excellent no
9 senior medium yes fair yes
13 senior medium no excellent no
我需要一个解决方案python(熊猫) –
R内置了对因素的支持。虽然熊猫有分类dtype,但很多图书馆都要求您使用虚拟字符。您可能需要使用熊猫的get_dummies或scikit-learn的OneHotEncoder。 – ayhan