使sklearn中的网格搜索功能忽略空模型

使用python和scikit-learn，我想做一个网格搜索。但是我的一些模型最终变得空虚。如何让网格搜索功能忽略这些模型？使sklearn中的网格搜索功能忽略空模型

我想我可以有一个评分函数，如果模型是空的，返回0，但我不知道如何。在某种程度上

predictor = sklearn.svm.LinearSVC(penalty='l1', dual=False, class_weight='auto') 
param_dist = {'C': pow(2.0, np.arange(-10, 11))} 
learner = sklearn.grid_search.GridSearchCV(estimator=predictor, 
              param_grid=param_dist, 
              n_jobs=self.n_jobs, cv=5, 
              verbose=0) 
learner.fit(X, y)

我的数据的，这learner对象会选择一个C对应一个空模型。任何想法如何确保模型不是空的？

编辑：由“空模型”我的意思是一个模型，选择了0个要素使用。特别是用l1正则化模型，这很容易发生。因此，在这种情况下，如果SVM中的C足够小，则优化问题将找到0向量作为系数的最优解。因此predictor.coef_将是0 s的向量。

来源

2015-09-05 adrin

什么是正空的模式？ – cel

好问题。在编辑中解释。 – adrin

你为什么要明确地忽略这些模型？如果具有全零系数的模型最好，那么你就知道有什么不对。 –

尝试实现定制的射手，类似于：

import numpy as np 

def scorer_(estimator, X, y): 
    # Your criterion here 
    if np.allclose(estimator.coef_, np.zeros_like(estimator.coef_)): 
     return 0 
    else: 
     return estimator.score(X, y) 

learner = sklearn.grid_search.GridSearchCV(... 
              scoring=scorer_)

来源

2015-09-06 11:13:31

很好的使用记分仪界面！ –

我不认为有这样的内置函数;这很容易，但是，做一个定制gridsearcher：

from sklearn.cross_validation import KFold                             
from sklearn.grid_search import GridSearchCV                             
from sklearn.cross_validation import cross_val_score                           
import itertools                                    
from sklearn import metrics                                 
import operator                                    


def model_eval(X, y, model, cv):                                
     scores = []                                   
     for train_idx, test_idx in cv:                              
       X_train, y_train = X[train_idx], y[train_idx]                         
       X_test, y_test = X[test_idx], y[test_idx]                          
       model.fit(X_train, y_train)                             
       nonzero_coefs = len(np.nonzero(model.coef_)[0]) #check for nonzero coefs                  
       if nonzero_coefs == 0: #if they're all zero, don't evaluate any further; move to next hyperparameter combo         
         return 0                                
       predictions = model.predict(X_test)                           
       score = metrics.accuracy_score(y_test, predictions)                       
       scores.append(score)                               
     return np.array(scores).mean()                              


X, y = make_classification(n_samples=1000,                             
          n_features=10,                              
          n_informative=3,                             
          n_redundant=0,                              
          n_repeated=0,                              
          n_classes=2,                              
          random_state=0,                             
          shuffle=False)                              


C = pow(2.0, np.arange(-20, 11))                                
penalty = {'l1', 'l2'}                                  

parameter_grid = itertools.product(C, penalty)                            

kf = KFold(X.shape[0], n_folds=5) #use the same folds to evaluate each hyperparameter combo                 

hyperparameter_scores = {}                                 
for C, penalty in parameter_grid:                                
     model = svm.LinearSVC(dual=False, C=C, penalty=penalty)                        
     result = model_eval(X, y, model, kf)                             
     hyperparameter_scores[(C, penalty)] = result                           

sorted_scores = sorted(hyperparameter_scores.items(), key=operator.itemgetter(1))                    

best_parameters, best_score = sorted_scores[-1]                            
print best_parameters                                   
print best_score

来源

2015-09-06 09:45:01 Ryan

使sklearn中的网格搜索功能忽略空模型

回答

相关问题