在IRIS DataSet上运行SVM并获取ValueError：未知标签类型：'unknown'

谁能以简单的方式向我解释此问题？为了您的方便，我提供了完整的代码。在IRIS DataSet上运行SVM并获取ValueError：未知标签类型：'unknown'

我有这样的代码，加载IRIS数据集和运行SVM：

from sklearn import svm 
import pandas as pd 


def prepare_iris_DS(): 
    print("Loading iris DS...") 
    url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' 
    iris = pd.read_csv(url, names=["sepal length", "sepal width", "petal length", "petal width", "Species"]) 
    df = pd.DataFrame(iris, columns=["sepal length", "sepal width", "petal length", "petal width", "Species"]) 

    df.head() 
    iris.head() 

    print("Iris DS is Loaded") 

    columns, labels = ["sepal length", "sepal width"], ["Iris-setosa", "Iris-virginica"] 

    total = df.shape[0] 
    df = df[df.Species.isin(labels)] 
    X = df[columns] 

    print("selected {0} entries out of {1} from the dataset based on labels {2}".format(len(X), total, str(labels))) 

    Y = df[["Species"]] 
    Y.loc[Y.Species != labels[0], 'Species'] = 0.0 
    Y.loc[Y.Species == labels[0], 'Species'] = 1.0 

    X = X.as_matrix() 
    Y = Y.as_matrix() 

    return X, Y 


X, Y = prepare_iris_DS() 

rbf_svc = svm.SVC(kernel='rbf', gamma=0.1, C=0.1) 
rbf_svc.fit(X, Y)

我一直在最后一行得到错误：rbf_svc.fit（X，Y）

File "C:\Anaconda2\lib\site-packages\sklearn\utils\multiclass.py", line 172, in check_classification_targets 

raise ValueError("Unknown label type: %r" % y_type) 

ValueError: Unknown label type: 'unknown'

不过。 ..
当我把这个命令它只是工作。
我不明白为什么？我欣赏清除/简单的答案

Y = Y.as_matrix().astype(float)

来源

2016-12-13 Samer Aamar

当：Y = Y.as_matrix()，观察目标阵列的数据类型：

>>> Y.dtype 
object

的SVC的fit方法预期可迭代数值的阵列，因为它的训练矢量，X。但是，目前，你已经传递了一个数字字符串值的数组，这是不正确的。

这是因为是直接分配给它时继承df[['Species]]的dtypes。因此，即使您已执行布尔索引并通过在loc操作期间用布尔值（0/1）取代字符串值，dtype Y不受影响，并且仍保留object类型。

因此，需要将它们重新命名为int/float dtype，然后可以通过fit函数来理解。

Y = Y.as_matrix().astype(float).ravel() # ravel to flatten the 2D array to 1D

现在，当你测试：

>>> Y.dtype 
float64

另外，还可以包括以下变化：

X = df[columns].copy() 
Y = df[["Species"]].copy()

通过创建数据框的深层复制，而不是避免SettingWithCopyWarning警告只是直接分配它。

来源

2016-12-13 20:01:18

感谢您的好评。但为什么我不需要为X获取矩阵时使用astype（float）？这里我的意思是： X = X.as_matrix（） –

你不需要为* X *执行任何类型转换，只是因为你为* X *分配的DF的子集已经包含了浮点数 - 也就是（*萼片长度，萼片宽度，花瓣长度，花瓣宽*）都是dtype'float64'。但*物种*栏不是。因此，只需要将* Y *作为数字类型。 –

非常感谢。这很有帮助！ –

在IRIS DataSet上运行SVM并获取ValueError：未知标签类型：'unknown'

回答

相关问题