如何处理从嵌套交叉验证获得的网格搜索中的best_score？

我已经优化了RandomForest，使用GridSearch进行嵌套交叉验证。之后，我知道用最好的参数，我必须训练整个数据集，然后对超出样本的数据进行预测。如何处理从嵌套交叉验证获得的网格搜索中的best_score？

我需要两次模型吗？一种是通过嵌套交叉验证找到准确性估计值，然后是样本外数据？

请检查我的代码：

#Load data 
for name in ["AWA"]: 
for el in ['Fp1']: 
    X=sio.loadmat('/home/TrainVal/{}_{}.mat'.format(name, el))['x'] 
    s_y=sio.loadmat('/home/TrainVal/{}_{}.mat'.format(name, el))['y'] 
    y=np.ravel(s_y) 

    print(name, el, x.shape, y.shape) 
    print("") 


#Pipeline 
clf = Pipeline([('rcl', RobustScaler()), 
       ('clf', RandomForestClassifier())]) 

#Optimization 
#Outer loop 
sss_outer = StratifiedShuffleSplit(n_splits=2, test_size=0.1, random_state=1) 
#Inner loop 
sss_inner = StratifiedShuffleSplit(n_splits=2, test_size=0.1, random_state=1) 


# Use a full grid over all parameters 
param_grid = {'clf__n_estimators': [10, 12, 15], 
       'clf__max_features': [3, 5, 10], 
      } 


# Run grid search 
grid_search = GridSearchCV(clf, param_grid=param_grid, cv=sss_inner, n_jobs=-1) 
#FIRST FIT!!!!! 
grid_search.fit(X, y) 
scores=cross_val_score(grid_search, X, y, cv=sss_outer) 

#Show best parameter in inner loop 
print(grid_search.best_params_) 

#Show Accuracy average of all the outer loops 
print(scores.mean()) 

#SECOND FIT!!! 
y_score = grid_search.fit(X, y).score(out-of-sample, y) 
print(y_score)

来源

2017-03-11 Aizzaac

有几个你需要了解的东西。

当你做你的“首次适应”，这将根据sss_inner CV适合gird_search模型，并存储在grid_search.best_estimator_的结果（即根据对测试数据的得分从sss_inner褶皱最好的估计）。

现在你正在使用grid_search在cross_val_score（嵌套）。从“第一次适合”您的适合的模型在这里没有用。 cross_val_score将克隆所述估计，对褶皱从sss_outer（这意味着，从sss_outer训练数据将被呈现给grid_search，这将根据sss_inner再次分裂它）调用grid_search.fit（）和呈现在测试的得分数据为sss_outer。 cross_val_score的型号未安装。

现在，在您的“第二次适合”中，您再次适合“第一次适应”。没有必要这样做，因为它已经安装好了。只需拨打grid_search.score()即可。它将在内部从best_estimator_调用score()。

你可以看my answer here了解更多关于网格搜索的嵌套交叉验证。

来源

2017-03-12 04:39:11

您grid_search.best_estimator_包含best_params_参数绍兴德胜验证的拟合模型，无需重新改装。

您可以使用：

clf = grid_search.best_estimator_ 
preds = clf.predict(X_unseen)

来源

2017-03-12 01:26:25 Ash

只需调用'grid_search.score（）'或'grid_search.predict（）'就会产生同样的效果。因为它会自动内部访问'best_estimator_'。 –

如何处理从嵌套交叉验证获得的网格搜索中的best_score？

回答

相关问题