GridSearchCV与StratifiedKFold

我想在RandomForestClassifier执行GridSearchCV，但数据是不均衡的，所以我用StratifiedKFold：GridSearchCV与StratifiedKFold

from sklearn.model_selection import StratifiedKFold 
from sklearn.grid_search import GridSearchCV 
from sklearn.ensemble import RandomForestClassifier 

param_grid = {'n_estimators':[10, 30, 100, 300], "max_depth": [3, None], 
      "max_features": [1, 5, 10], "min_samples_leaf": [1, 10, 25, 50], "criterion": ["gini", "entropy"]} 

rfc = RandomForestClassifier() 

clf = GridSearchCV(rfc, param_grid=param_grid, cv=StratifiedKFold()).fit(X_train, y_train)

但我得到一个错误：

TypeError         Traceback (most recent call last) 
<ipython-input-597-b08e92c33165> in <module>() 
    9 rfc = RandomForestClassifier() 
    10 
---> 11 clf = GridSearchCV(rfc, param_grid=param_grid, cv=StratifiedKFold()).fit(X_train, y_train) 

c:\python34\lib\site-packages\sklearn\grid_search.py in fit(self, X, y) 
    811 
    812   """ 
--> 813   return self._fit(X, y, ParameterGrid(self.param_grid)) 

c:\python34\lib\site-packages\sklearn\grid_search.py in _fit(self, X, y, parameter_iterable) 
    559          self.fit_params, return_parameters=True, 
    560          error_score=self.error_score) 
--> 561     for parameters in parameter_iterable 
    562     for train, test in cv) 

c:\python34\lib\site-packages\sklearn\externals\joblib\parallel.py in __call__(self, iterable) 
    756    # was dispatched. In particular this covers the edge 
    757    # case of Parallel used with an exhausted iterator. 
--> 758    while self.dispatch_one_batch(iterator): 
    759     self._iterating = True 
    760    else: 

c:\python34\lib\site-packages\sklearn\externals\joblib\parallel.py in dispatch_one_batch(self, iterator) 
    601 
    602   with self._lock: 
--> 603    tasks = BatchedCalls(itertools.islice(iterator, batch_size)) 
    604    if len(tasks) == 0: 
    605     # No more tasks available in the iterator: tell caller to stop. 

c:\python34\lib\site-packages\sklearn\externals\joblib\parallel.py in __init__(self, iterator_slice) 
    125 
    126  def __init__(self, iterator_slice): 
--> 127   self.items = list(iterator_slice) 
    128   self._size = len(self.items) 

c:\python34\lib\site-packages\sklearn\grid_search.py in <genexpr>(.0) 
    560          error_score=self.error_score) 
    561     for parameters in parameter_iterable 
--> 562     for train, test in cv) 
    563 
    564   # Out is a list of triplet: score, estimator, n_test_samples 

TypeError: 'StratifiedKFold' object is not iterable

当我写cv=StratifiedKFold(y_train)我有ValueError: The number of folds must be of Integral type.但是当我写`cv = 5时，它可以工作。

我不明白什么是错的StratifiedKFold

来源

2016-10-26 user183897

API中的最新版本的改变。您曾经传递y，现在只需在创建分层Klfold对象时传递数字即可。你以后通过y。

来源

2016-10-26 08:46:39 simon

我写'CV = StratifiedKFold（10）'和得到'类型错误： 'StratifiedKFold' 对象不是iterable'何时应该套印Y？ – user183897

在当前版本中导入sklearn.model_selection.StratifiedKFold。然后你可以做cv = StratifiedKFold（10），应该没有错误。但是，也许你是从前面的模块导入，为了兼容目的，它仍然存在，直到版本20为止。 – simon

我可以再问一个问题吗？我从这个网站下载http://www.lfd.uci.edu/~gohlke/pythonlibs/#scikit-learn文件scikit_learn-0.18-cp34-cp34m-win32.whl，安装它，但现在我得到了'ImportError：DLL加载失败：％1不是有效的Win32应用程序。 '。哪里不对？ – user183897

似乎cv=StratifiedKFold()).fit(X_train, y_train)应改为cv=StratifiedKFold()).split(X_train, y_train).

来源

2017-01-14 19:19:36 ebrahimi

这与错误无关。这条线：clf = GridSearchCV（rfc，param_grid = param_grid，cv = StratifiedKFold（））。fit（X_train，y_train）只是定义了对象clf，然后它调用fit方法来训练/适应clf。 – sera

@ rll还提到，适合应该被拆分取代。 – ebrahimi

这里的问题是一个API的变化在其他的答案中提到，但答案可能会更加明确。

的cv参数文档状态：

cv : int, cross-validation generator or an iterable, optional

Determines the cross-validation splitting strategy. Possible inputs for cv are:

None, to use the default 3-fold cross-validation, integer, to specify the number of folds.

An object to be used as a cross-validation generator.

An iterable yielding train/test splits.

For integer/None inputs, if y is binary or multiclass, StratifiedKFold used. If the estimator is a classifier or if y is neither binary nor multiclass, KFold is used.

所以，无论cross validation strategy使用，所有需要的是使用功能split提供发电机，作为建议：

kfolds = StratifiedKFold(5) 
clf = GridSearchCV(estimator, parameters, scoring=qwk, cv=kfolds.split(xtrain,ytrain)) 
clf.fit(xtrain, ytrain)

来源

2017-06-01 14:34:07 rll

我完全一样的问题。

为我工作的解决方案是取代：

from sklearn.grid_search import GridSearchCV

与

from sklearn.model_selection import GridSearchCV

那么它应该工作的罚款。

来源

2017-06-01 15:00:41 sera

在'0.18.1'版本的Sklearn。

GridSearchCV(estimator, param=param_grid, c=5)

实现具有5个分割一个StratifiedKFold。

文档：

> cv : int, cross-validation generator or an iterable, optional 
>   Determines the cross-validation splitting strategy. 
>   Possible inputs for cv are: 
>   - None, to use the default 3-fold cross validation, 
>   - integer, to specify the number of folds in a `(Stratified)KFold`, 
>   - An object to be used as a cross-validation generator. 
>   - An iterable yielding train, test splits.

来源

2017-10-19 20:37:59

GridSearchCV与StratifiedKFold

回答

相关问题