2017-02-17 111 views
1

我想问一下是否有可能做“Startified GroupShuffleSplit”在scikit学习是换言之和GroupShuffleSplit组合StratifiedShuffleSplitStartified GroupShuffleSplit在Scikit学习

这里是代码的样本我使用:

cv=GroupShuffleSplit(n_splits=n_splits,test_size=test_size,\ 
    train_size=train_size,random_state=random_state).split(\ 
    allr_sets_nor[:,:2],allr_labels,groups=allr_groups) 
opt=GridSearchCV(SVC(decision_function_shape=dfs,tol=tol),\ 
    param_grid=param_grid,scoring=scoring,n_jobs=n_jobs,cv=cv,verbose=verbose) 
opt.fit(allr_sets_nor[:,:2],allr_labels) 

这里我申请GroupShuffleSplit但我还是想根据allr_labels

+0

StratifiedShuffleSplit如果你想要的话,也有一个参数组。只是使用Stratifiedshufflesplit将allr_labels和适合在GridSearchCV通过组到fit()方法 –

+0

它不适用于我不幸的是,我认为这个选项是无效的,因为它在文档中说:“始终忽略,为兼容性而存在。” “ –

回答

3

添加startification我通过在应用StratifiedShuffleSplit解决了这个问题的组和然后找到训练和手动测试集的索引,因为它们连接到基团指数(在我的情况下,每个组包含从6*index6*index+5 6个连续集合)

如以下:

sss=StratifiedShuffleSplit(n_splits=n_splits,test_size=test_size, 
    train_size=train_size,random_state=random_state).split(all_groups,all_labels) 
     # startified splitting for groups only 

i=0 
train_is = [np.array([],dtype=int)]*n_splits 
test_is = [np.array([],dtype=int)]*n_splits 
for train_index,test_index in sss : 
     # finding the corresponding indices of reflected training and testing sets 
    train_is[i]=np.hstack((train_is[i],np.concatenate([train_index*6+i for i in range(6)]))) 
    test_is[i]=np.hstack((test_is[i],np.concatenate([test_index*6+i for i in range(6)]))) 
    i=i+1 

cv=[(train_is[i],test_is[i]) for i in range(n_splits)] 
     # constructing the final cross-validation iterable: list of 'n_splits' tuples; 
     # each tuple contains two numpy arrays for training and testing indices respectively 

opt=GridSearchCV(SVC(decision_function_shape=dfs,tol=tol),param_grid=param_grid, 
       scoring=scoring,n_jobs=n_jobs,cv=cv,verbose=verbose) 
opt.fit(allr_sets_nor[:,:2],allr_labels)