我正在尝试在十个交叉验证的每个人中的每个人中进行最佳超参数GridSearch,它与我以前的多类分类工作很好地工作,但不是这种情况这次与多标签工作。GridSearch for Scikit-learn中的多标签分类
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
clf = OneVsRestClassifier(LinearSVC())
C_range = 10.0 ** np.arange(-2, 9)
param_grid = dict(estimator__clf__C = C_range)
clf = GridSearchCV(clf, param_grid)
clf.fit(X_train, y_train)
我收到错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-65-dcf9c1d2e19d> in <module>()
6
7 clf = GridSearchCV(clf, param_grid)
----> 8 clf.fit(X_train, y_train)
/usr/local/lib/python2.7/site-packages/sklearn/grid_search.pyc in fit(self, X, y)
595
596 """
--> 597 return self._fit(X, y, ParameterGrid(self.param_grid))
598
599
/usr/local/lib/python2.7/site-packages/sklearn/grid_search.pyc in _fit(self, X, y,
parameter_iterable)
357 % (len(y), n_samples))
358 y = np.asarray(y)
--> 359 cv = check_cv(cv, X, y, classifier=is_classifier(estimator))
360
361 if self.verbose > 0:
/usr/local/lib/python2.7/site-packages/sklearn/cross_validation.pyc in _check_cv(cv, X,
y, classifier, warn_mask)
1365 needs_indices = None
1366 if classifier:
-> 1367 cv = StratifiedKFold(y, cv, indices=needs_indices)
1368 else:
1369 if not is_sparse:
/usr/local/lib/python2.7/site-packages/sklearn/cross_validation.pyc in __init__(self,
y, n_folds, indices, shuffle, random_state)
427 for test_fold_idx, per_label_splits in enumerate(zip(*per_label_cvs)):
428 for label, (_, test_split) in zip(unique_labels, per_label_splits):
--> 429 label_test_folds = test_folds[y == label]
430 # the test split can be too big because we used
431 # KFold(max(c, self.n_folds), self.n_folds) instead of
ValueError: boolean index array should have 1 dimension
这可能指的是尺寸或标签指示的格式。
print X_train.shape, y_train.shape
得到:
(147, 1024) (147, 6)
似乎GridSearch
工具StratifiedKFold
本质。 这个问题在分层K折叠策略中引入了多标签问题。
StratifiedKFold(y_train, 10)
给
ValueError Traceback (most recent call last)
<ipython-input-87-884ffeeef781> in <module>()
----> 1 StratifiedKFold(y_train, 10)
/usr/local/lib/python2.7/site-packages/sklearn/cross_validation.pyc in __init__(self,
y, n_folds, indices, shuffle, random_state)
427 for test_fold_idx, per_label_splits in enumerate(zip(*per_label_cvs)):
428 for label, (_, test_split) in zip(unique_labels, per_label_splits):
--> 429 label_test_folds = test_folds[y == label]
430 # the test split can be too big because we used
431 # KFold(max(c, self.n_folds), self.n_folds) instead of
ValueError: boolean index array should have 1 dimension
目前使用的传统的K-倍战略工作正常。 是否有任何方法实施分层K折叠到多标签分类?
谢谢你的评论,我注意到了问题,并更新了线程。也感谢分享纸张,我会通过它。 – Francis 2014-09-26 06:06:53
我刚想出一个想法,对每个类别样本进行分层分割都没有意义吗?既然'GridSearchCV'正在完成'OneVsRestClassifier',为什么它不能单独处理每个类样本以产生'L'二元问题,因此可以对每个'L'进行分层分割? – Francis 2015-11-18 14:29:56