2016-08-15 100 views
1
age  income student  credit_rating Class_buys_computer 
0 youth high no fair no 
1 youth high no excellent no 
2 middle_aged  high no fair yes 
3 senior medium no fair yes 
4 senior low  yes  fair yes 
5 senior low  yes  excellent no 
6 middle_aged  low  yes  excellent yes 
7 youth medium no fair no 
8 youth low  yes  fair yes 
9 senior medium yes  fair yes 
10 youth medium yes  excellent yes 
11 middle_aged  medium no excellent yes 
12 middle_aged  high yes  fair yes 
13 senior medium no excellent no 

我使用这个数据集,并希望有变量,例如ageincome等像在Rfactor variables,我怎么能做到这一点在Python如何在蟒蛇catagorical因子变量

+0

我需要一个解决方案python(熊猫) –

+0

R内置了对因素的支持。虽然熊猫有分类dtype,但很多图书馆都要求您使用虚拟字符。您可能需要使用熊猫的get_dummies或scikit-learn的OneHotEncoder。 – ayhan

回答

1

您可以使用astype与参数category

cols = ['age','income','student'] 

for col in cols: 
    df[col] = df[col].astype('category') 

print (df.dtypes) 
age     category 
income     category 
student    category 
credit_rating   object 
Class_buys_computer  object 
dtype: object 

如果需要转换的所有列:

for col in df.columns: 
    df[col] = df[col].astype('category') 

print (df.dtypes) 
age     category 
income     category 
student    category 
credit_rating   category 
Class_buys_computer category 
dtype: object 

你需要循环,因为如果使用:

df = df.astype('category') 

NotImplementedError: > 1 ndim Categorical are not supported at this time

Pandas documentation about categorical

编辑的评论:

如果需要订购catagorical,使用带有pandas.Categorical另一种解决方案:

df['age']=pd.Categorical(df['age'],categories=["youth","middle_aged","senior"],ordered=True) 

print (df.age) 
0   youth 
1   youth 
2  middle_aged 
3   senior 
4   senior 
5   senior 
6  middle_aged 
7   youth 
8   youth 
9   senior 
10   youth 
11 middle_aged 
12 middle_aged 
13   senior 
Name: age, dtype: category 
Categories (3, object): [youth < middle_aged < senior] 

然后你就可以age列进行排序数据框:

df = df.sort_values('age') 
print (df) 
      age income student credit_rating Class_buys_computer 
0   youth high  no   fair     no 
1   youth high  no  excellent     no 
7   youth medium  no   fair     no 
8   youth  low  yes   fair     yes 
10  youth medium  yes  excellent     yes 
2 middle_aged high  no   fair     yes 
6 middle_aged  low  yes  excellent     yes 
11 middle_aged medium  no  excellent     yes 
12 middle_aged high  yes   fair     yes 
3  senior medium  no   fair     yes 
4  senior  low  yes   fair     yes 
5  senior  low  yes  excellent     no 
9  senior medium  yes   fair     yes 
13  senior medium  no  excellent     no 
+0

是否有可能像这样的青少年

+0

是的,当然,给我一下。 – jezrael