2017-10-18 175 views
0

想知道为什么我得到这个错误。我只能重现它,如果我在我的数据框中的级别非法列名称,但为什么它在射频实施工作?非法列名错误,但列名是合法的

考虑使用护林员,因为它似乎运行得更快。

library(caret) 
library(ranger) 
library(randomForest) 

df <- data.frame(class = c(rep(c('A','B'), 10)), var1 = runif(20, 0,10), var2 = runif(20, 0,20), var3 = c(rep(c(' A','1 B', 'C'), 6), 'D','D')) 
df 

CTRL <- trainControl(method = "repeatedcv", 
        number = 2, 
        repeats = 1, 
        verboseIter = TRUE, 
        classProbs = TRUE, 
        returnResamp = "final", 
        summaryFunction = twoClassSummary) 

ranger_model <- caret::train(class ~ ., 
           df, 
           method = "ranger", 
           trControl = CTRL, 
           preProc = c("center", "scale"), 
           metric="ROC", 
           tuneGrid = expand.grid(.mtry=c(1,2))) 

rf_model <- caret::train(class ~ ., 
           df, 
           method = "rf", 
           trControl = CTRL, 
           preProc = c("center", "scale"), 
           metric="ROC", 
           tuneGrid = expand.grid(.mtry=c(1,2))) 

ranger_model 
rf_model 

游侠输出:

+ Fold1.Rep1: mtry=1 
model fit failed for Fold1.Rep1: mtry=1 Error in parse.formula(formula, data) : 
Error: Illegal column names in formula interface. Fix column names or use alternative interface in ranger. 

而且,当我检查文档游侠产生错误,我不理解为什么这个计算结果为TRUE,因为当我运行的代码我DF,我没有得到相同的结果:

## Error if illegal column name 
if (!all(make.names(independent_vars[!interaction_idx]) == independent_vars[!interaction_idx])) { 
stop("Error: Illegal column names in formula interface. Fix column names or use alternative interface in ranger.") 
} 

https://github.com/cran/ranger/blob/master/R/formula.R

当我在我的DF运行:

formula <- 'class ~ .' 
data <- df 

f <- as.formula(formula) 
t <- terms(f, data = data) 

## Get dependent var(s) 
response <- data.frame(eval(f[[2]], envir = data)) 
colnames(response) <- deparse(f[[2]]) 

## Get independent vars 
independent_vars <- attr(t, "term.labels") 
interaction_idx <- grepl(":", independent_vars) 

## Error if illegal column name 
if (!all(make.names(independent_vars[!interaction_idx]) == independent_vars[!interaction_idx])) { 
    print("Error: Illegal column names in formula interface. Fix column names or use alternative interface in ranger.") 
} 

> !all(make.names(independent_vars[!interaction_idx]) == independent_vars[!interaction_idx]) 
## [1] FALSE 

是因为因子列做成一个使用因子水平作为列名1热编码矩阵?再次,不知道为什么它可以在RF而不是游侠中工作。

想法?

回答

1

这应该固定在插入符号6.0-77。在您的示例中,您必须将splitrule参数添加到tuneGrid

library(caret) 
library(ranger) 
library(randomForest) 

df <- data.frame(class = c(rep(c('A','B'), 10)), var1 = runif(20, 0,10), var2 = runif(20, 0,20), var3 = c(rep(c(' A','1 B', 'C'), 6), 'D','D')) 
df 

CTRL <- trainControl(method = "repeatedcv", 
        number = 2, 
        repeats = 1, 
        verboseIter = TRUE, 
        classProbs = TRUE, 
        returnResamp = "final", 
        summaryFunction = twoClassSummary) 

ranger_model <- caret::train(class ~ ., 
          df, 
          method = "ranger", 
          trControl = CTRL, 
          preProc = c("center", "scale"), 
          metric="ROC", 
          tuneGrid = expand.grid(.mtry=c(1,2), .splitrule="gini")) 

rf_model <- caret::train(class ~ ., 
         df, 
         method = "rf", 
         trControl = CTRL, 
         preProc = c("center", "scale"), 
         metric="ROC", 
         tuneGrid = expand.grid(.mtry=c(1,2))) 

ranger_model 
rf_model