2017-03-17 150 views
2

我想在sklearn中使用随机搜索和分组k折叠交叉验证发生器实施网格搜索参数。以下作品:sklearn网格搜索与分组K折叠cv发电机

skf=StratifiedKFold(n_splits=5,shuffle=True,random_state=0) 
rs=sklearn.model_selection.RandomizedSearchCV(clf,parameters,scoring='roc_auc',cv=skf,n_iter=10) 
rs.fit(X,y) 

这不

gkf=GroupKFold(n_splits=5) 
rs=sklearn.model_selection.RandomizedSearchCV(clf,parameters,scoring='roc_auc',cv=gkf,n_iter=10) 
rs.fit(X,y) 

#ValueError: The groups parameter should not be None 

如何指示groups参数?

无论这是否

gkf=GroupKFold(n_splits=5) 
fv = gkf.split(X, y, groups=groups) 
rs=sklearn.model_selection.RandomizedSearchCV(clf,parameters,scoring='roc_auc',cv=gkf,n_iter=10) 
rs.fit(X,y) 

#TypeError: object of type 'generator' has no len() 

回答

2

作为参考,这是通过

rs.fit(X,y,groups=groups) 

做了

rs=sklearn.model_selection.RandomizedSearchCV(forest,parameters,scoring='roc_auc',cv=gkf,n_iter=10)