2016-01-13 108 views
0

我正在学习如何在Python中使用决策树。我修改了一个例子来导入CSV文件,而不是使用虹膜数据集从该网站:Python分类和回归树错误

http://machinelearningmastery.com/get-your-hands-dirty-with-scikit-learn-now/

代码:

import numpy as np 
import urllib 
from sklearn.tree import DecisionTreeClassifier 
from sklearn import tree 
from sklearn import datasets 
from sklearn import metrics 

# URL for the Pima Indians Diabetes dataset (UCI Machine Learning Repository) 
url = "http://goo.gl/j0Rvxq" 
# download the file 
raw_data = urllib.urlopen(url) 
# load the CSV file as a numpy matrix 
dataset = np.loadtxt(raw_data, delimiter=",") 
#print(dataset.shape) 
# separate the data from the target attributes 
X = dataset[:,0:7] 
y = dataset[:,8] 
# fit a CART model to the data 
model = DecisionTreeClassifier() 
model.fit(dataset.data, dataset.target) 
print model 

错误:

Traceback (most recent call last): 
    File "DatasetTest2.py", line 24, in <module> 
    model.fit(dataset.data, dataset.target) 
AttributeError: 'numpy.ndarray' object has no attribute 'target' 

我不知道为什么会出现这个错误。如果我使用示例中的虹膜数据集,那么它工作得很好。最终,我需要能够在csv文件上执行决策树。

我也试着以下代码也导致同样的错误:

# Import Python Modules 
from sklearn.tree import DecisionTreeClassifier 
from sklearn import tree 
from sklearn import datasets 
from sklearn import metrics 
import pandas as pd 
import numpy as np 

#Import Data 
raw_data = pd.read_csv("DataTest1.csv") 
dataset = raw_data.as_matrix() 
#print dataset.shape 
#print dataset 
# separate the data from the target attributes 
X = dataset[:,[2,3,4,7,10]] 
y = dataset[:,[1]] 
#print X 
# fit a CART model to the data 
model = DecisionTreeClassifier() 
model.fit(dataset.data, dataset.target) 
print model 

回答

0

,其设置在例如导入的dataset对象不是数据的一个普通的表。这是一个特殊的对象,使用datatarget等属性进行设置,以便可以按照示例中所示使用它。如果你有自己的数据,你需要决定使用什么作为数据和目标。从你的例子看来,你想要做model.fit(X, y)