2017-02-28 149 views
-1

我正在尝试在sklearn中训练RF模型进行分类。对于特定的特征向量,我得到的测试准确度很低。我假设我选择的特征向量会误导模型。所以我尝试了RFE,RFECV等来找到一组相关的特征向量 - 并没有帮助提高准确性。我想出了一个简单的功能选择过程如下:>随机森林:查找相关功能

ml_feats = #initial set of feature vector 

while True 
    feats_to_del=[] 
    prev_score=0 
    for feat_len in range(2,len(ml_feats)): 
     classifier = RandomForestClassifier(**init_params) 
     classifier.fit(X[ml_feats[:feat_len]],Y) 
     score = classifier.score(Xt[ml_feats[:feat_len]],Yt) 
     if score<prev_score: 
      #feature that caused the score to decrease 
      print ml_feats[feat_len] 
      feat_to_del.append(ml_feats[feat_len]) 
     prev_score=score 
    if len(feats_to_del)==0: 
     break 
    #delete irrelevant features 
    ml_feats=list(set(ml_feats)-set(feats_to_del)) 

print ml_feats #print all relevant features 

以上代码是否有助于找出正确的功能集? 谢谢

回答

0

你在做什么是一个贪婪的功能选择。如果你想使用RandomForestClassifier来选择功能,你可以这样做:

from sklearn.ensemble import RandomForestClassifier 
from sklearn.feature_selection import SelectFromModel 
# xtrain : training data 
# ytrain : training labels 

clf = RandomForestClassifier() 
sfm = SelectFromModel(estimator=clf, threshold='mean') # threshold of selection is mean of feature importances by random forest classifier 
sfm.fit(xtrain, ytrain) 
selected_xtrain = sfm.transform(xtrain) 
+0

它会帮助删除不相关的功能吗? –

+0

是的。你为什么不尝试呢? –

+0

我试过....没有显着的改善精度。 –