2016-02-19 55 views
0

这是我使用非线性SVM进行数字分类的代码。我应用交叉验证方案来选择超参数cgamma。但是,由GridSearch返回的模型没有n_support_属性来获取支持向量的数量。交叉验证后如何获得支持向量号码

from sklearn import datasets 
from sklearn.cross_validation import train_test_split 
from sklearn.grid_search import GridSearchCV 
from sklearn.metrics import classification_report 
from sklearn.svm import SVC 
from sklearn.cross_validation import ShuffleSplit 


# Loading the Digits dataset 
digits = datasets.load_digits() 

# To apply an classifier on this data, we need to flatten the image, to 
# turn the data in a (samples, feature) matrix: 
n_samples = len(digits.images) 
X = digits.images.reshape((n_samples, -1)) 
y = digits.target 

# Split the dataset in two equal parts 
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.5, random_state=0) 

#Intilize an svm estimator 
estimator=SVC(kernel='rbf',C=1,gamma=1) 

#Choose cross validation iterator. 
cv = ShuffleSplit(X_train.shape[0], n_iter=5, test_size=0.2, random_state=0) 

# Set the parameters by cross-validation 
tuned_parameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4,1,2,10], 
        'C': [1, 10, 50, 100, 1000]}, 
        {'kernel': ['linear'], 'C': [1, 10, 100, 1000]}] 


clf=GridSearchCV(estimator=estimator, cv=cv, param_grid=tuned_parameters) 

#begin the cross-validation task to get the best model with best parameters. 
#After this task, we get a clf as a best model with best parameters C and gamma. 
clf.fit(X_train,y_train) 

print() 

print ("Best parameters: ") 

print(clf.get_params) 


print("error test set with clf1",clf.score(X_test,y_test)) 

print("error training set with cf1",clf.score(X_train,y_train)) 

#It does not work. So, how can I recuperate the number of vector support? 
print ("Number of support vectors by class", clf.n_support_); 

**##Here is my methods. I train a new SVM object with the best parameters and I remark that it gate the same test and train error as clf** 
clf2=SVC(C=10,gamma= 0.001); 

clf2.fit(X_train,y_train) 

print("error test set with clf2 ",clf2.score(X_test,y_test)) 

print("error training set with cf1",clf.score(X_train,y_train)) 

print clf2.n_support_ 

任何评论,如果我提出的方法是正确的?

回答

1

GridSearchCV将适合一些模型。你可以用clf.best_estimator_得到最好的一个,所以你可以使用clf.best_estimator_.n_support_找到支持向量的索引,当然len(clf.best_estimator_.n_support_)会给你支持向量的数目。

您还可以分别获得clf.best_params_clf.best_score_的最佳模型的参数和得分。