2014-03-31 41 views
0

我试图对我的KNN分类器的结果进行交叉验证。我使用了下面的代码,它返回一个类型错误。SKLearn交叉验证错误 - 类型错误

对于上下文,我已经导入了SciKit Learn,Numpy和Pandas库。

from sklearn.cross_validation import cross_val_score, ShuffleSplit 

n_samples = len(y) 
knn = KNeighborsClassifier(3) 
cv = ShuffleSplit(n_samples, n_iter=10, test_size=0.3, random_state=0) 

test_scores = cross_val_score(knn, X, y, cv=cv) 
test_scores.mean() 

返回:

--------------------------------------------------------------------------- 
TypeError         Traceback (most recent call last) 
<ipython-input-139-d8cc3ee0c29b> in <module>() 
    7 cv = ShuffleSplit(n_samples, n_iter=10, test_size=0.3, random_state=0) 
    8 
    9 test_scores = cross_val_score(knn, X, y, cv=cv) 
10 test_scores.mean() 

//anaconda/lib/python2.7/site-packages/sklearn/cross_validation.pyc in  cross_val_score(estimator, X, y, scoring, cv, n_jobs, verbose, fit_params, score_func, pre_dispatch) 
1150   delayed(_cross_val_score)(clone(estimator), X, y, scorer, train, test, 
1151         verbose, fit_params) 
1152   for train, test in cv) 
1153  return np.array(scores) 
1154 

//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self, iterable) 
515   try: 
516    for function, args, kwargs in iterable: 
517     self.dispatch(function, args, kwargs) 
518 
519    self.retrieve() 
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in dispatch(self, func, args, kwargs) 
310   """ 
311   if self._pool is None: 
312    job = ImmediateApply(func, args, kwargs) 
313    index = len(self._jobs) 
314    if not _verbosity_filter(index, self.verbose): 
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __init__(self, func, args, kwargs) 
134   # Don't delay the application, to avoid keeping the input 
135   # arguments in memory 
136   self.results = func(*args, **kwargs) 
137 
138  def get(self): 

//anaconda/lib/python2.7/site-packages/sklearn/cross_validation.pyc in _cross_val_score(estimator, X, y, scorer, train, test, verbose, fit_params) 
1056   y_test = None 
1057  else: 
1058   y_train = y[train] 
1059   y_test = y[test] 
1060  estimator.fit(X_train, y_train, **fit_params) 

TypeError: only integer arrays with one element can be converted to an index 
+0

请指定您的y变量是否是从pandas.DataFrame派生的 – eickenberg

回答

1

这是与大熊猫的错误​​。 Scikit学习期望numpy数组,稀疏矩阵或行为类似于这些的对象。

pandas DataFrames的主要问题是由于使用索引选择列而不是行。通过DataFrame.loc [...]完成pandas中的行索引。这是sklearn的意外行为。该错误可能来自1058行,代码未能提取列车样本。

为了解决这个问题,如果你的y是一个数据框列,试试你的列转换为数组类型

y = y.values 

否则pandas-sklearn可能是一个选项。