UnicodeEncodeError使用DecisionTree

我的代码如下UnicodeEncodeError使用DecisionTree

# -*- coding: utf-8 -*- 
import pandas as pd 
from sklearn.model_selection import train_test_split 
from sklearn import tree 

Model_Dev_Val = pd.read_excel("data2.xlsx") 

target = Model_Dev_Val[['source_2']] 

model_train, model_test, y_train, y_test = train_test_split(Model_Dev_Val, target,test_size = 0.5, random_state = 40,stratify = target) 

clf = tree.DecisionTreeClassifier() 
clf = clf.fit(model_train,y_train)

但它抛出一个错误：

UnicodeEncodeError: 'decimal' codec can't encode characters in position 0-2: invalid decimal Unicode string

data2.xlsx include some Chinese, and the data has been cleaned.

来源

2017-03-16 idiot python

可能会有文件中的中文字符出现问题。 – PinkFluffyUnicorn

我想过了。我从老板那里获取正确的data.xlsx。并且它错误：ValueError：输入包含NaN，无穷大或者对于dtype（'float32'）来说值太大。 –

然后在那里可能有一个'NaN'，'infinity'或者太大的数字 – PinkFluffyUnicorn

有可能会是在你的文件中的中国角色的问题。

抛出其他错误（ValueError）可能意味着有可能是在那里

一个NaN，infinity或过大的数字。如果相同的代码运行你的老板的电脑上很好，比它可能是由有你的机器造成的更少的内存，python的不同版本，scikit-learn的不同版本，甚至是别的东西。

来源

2017-03-16 09:49:47 PinkFluffyUnicorn

我想问，sklearn支持单词，还是只有号码 –

您使用的所有内容都应该转换为数字。但是你可以用文字来做到这一点，例如为每个单词指定一个不同的数字，或者使用[单热编码]（https://en.wikipedia.org/wiki/One-hot）来表示您的单词。 – PinkFluffyUnicorn

这也可以帮助你：[在scikit中使用字符串数据]（http://scikit-learn.org/stable/faq.html#how-do-i-deal-with-string-data-or-trees-graphs ） – PinkFluffyUnicorn

我不知道这是否可以解决你的问题，因为我无法重现你的问题，但你可以试试：

import sys 
reload(sys) 
sys.setdefaultencoding('utf8')

我希望它可以帮助你。

来源

2017-03-16 09:59:10 mebusy

同样的问题，sry pal –

UnicodeEncodeError使用DecisionTree

回答

相关问题