提取所选择的功能名称

# Load dataset 
iris = datasets.load_iris() 
X, y = iris.data, iris.target 

rf_feature_imp = RandomForestClassifier(100) 
feat_selection = SelectFromModel(rf_feature_imp, threshold=0.5) 

clf = RandomForestClassifier(5000) 

model = Pipeline([ 
      ('fs', feat_selection), 
      ('clf', clf), 
     ]) 

params = { 
    'fs__threshold': [0.5, 0.3, 0.7], 
    'fs__estimator__max_features': ['auto', 'sqrt', 'log2'], 
    'clf__max_features': ['auto', 'sqrt', 'log2'], 
} 

gs = GridSearchCV(model, params, ...) 
gs.fit(X,y)

上面的代码是基于Ensuring right order of operations in random forest classification in scikit learn 提取所选择的功能名称

由于我使用的SelectFromModel，我想打印的被选中的功能的名称（在SelectFromModel管道）但不知道如何提取它们。

来源

2016-02-13 user308827

一种方法是在功能名称上调用功能选择器的transform()，但必须以示例列表的形式显示功能名称。

首先，您必须从GridSearchCV中的最佳估计器中获取特征选择阶段。

fs = gs.best_estimator_.named_steps['fs']

从feature_names中创建一个示例表：

feature_names_example = [iris.feature_names]

使用功能选择转换这个例子。

selected_features = fs.transform(feature_names_example) 

print selected_features[0] # Select the one example 
# ['sepal length (cm)' 'petal length (cm)' 'petal width (cm)']

来源

2016-02-13 07:03:10

再次感谢@大卫·马斯特！ – user308827

这段代码中的'fs__threshold' 0.7对scikit-learn 0.17.1和Python 2.7以及'load_iris'数据集导致了以下错误。 'gs.fit（X，y）'行会产生以下错误C：\ Python27 \ lib \ site-packages \ sklearn \ feature_selection \ base.py：80：UserWarning：未选择任何功能：数据过于嘈杂或者选择测试过于严格。 UserWarning）Traceback（最近一次调用最后一次）：ValueError：找到包含0个特征（shape =（99，0））的数组，而最小值为1是必需的。我发现如果删除了0.7，代码将按预期运行。看起来很奇怪，但至少它运行。 –

是的。如果没有重要性大于0.7的特征，这将是有意义的，这并不令人惊讶。如果没有给出random_state，RandomForestClassifier也是不确定的。 –

SelectFromModel具有get_support()方法，该方法返回被选中的功能的布尔掩码。这样你可以做（除了由@大卫莫斯特描述的预备步骤）：

feature_names = np.array(iris.feature_names) 
selected_features = feature_names[fs.get_support()]

来源

2017-06-08 13:44:38 linkyndy

S = model.named_steps [ 'FS']适合（X，Y）

X .columns [s.get_support（）]

来源

2017-11-05 01:26:53

提取所选择的功能名称

回答

相关问题