2017-11-11 125 views
0

使用插入程序包时,我无法使用以下用户定义的汇总函数工作。它应该计算logloss,但我一直得到没有找到logloss。下面,重复的例子:用户自定义总结插入符号中的函数

data <- data.frame('target' = sample(c('Y','N'),100,replace = T), 'X1' = runif(100), 'X2' = runif(100)) 

log.loss2 <- function(data, lev = NULL, model = NULL) { 
    logloss = -sum(data$obs*log(data$Y) + (1-data$obs)*log(1-data$Y))/length(data$obs) 
    names(logloss) <- c('LL') 
    logloss 
} 

fitControl <- trainControl(method="cv",number=1, classProbs = T, summaryFunction = log.loss2) 

my.grid <- expand.grid(.decay = c(0.05), .size = c(2)) 

fit.nnet2 <- train(target ~., data = data, 
        method = "nnet", maxit = 500, metric = 'LL', 
        tuneGrid = my.grid, verbose = T) 

回答

1

错误是由于您未包括在调用训练trControl = fitControl的事实。然而,将带你到另一个错误是由于data$obsdata$pred的因素 - 一个需要转换为数值赋予12,减去1给出所需01

log.loss2 <- function(data, lev = NULL, model = NULL) { 
    data$pred <- as.numeric(data$pred)-1 
    data$obs <- as.numeric(data$obs)-1 
    logloss = -sum(data$obs*log(data$Y) + (1-data$obs)*log(1-data$Y))/length(data$obs) 
    names(logloss) <- c('LL') 
    logloss 
} 

fitControl <- trainControl(method="cv",number=1, classProbs = T, summaryFunction = log.loss2) 

fit.nnet2 <- train(target ~., data = data, 
        method = "nnet", maxit = 500, metric = "LL" , 
        tuneGrid = my.grid, verbose = T, trControl = fitControl) 
#output 
Neural Network 

100 samples 
    2 predictor 
    2 classes: 'N', 'Y' 

No pre-processing 
Resampling: Cross-Validated (1 fold) 
Summary of sample sizes: 0 
Resampling results: 

    LL  
    0.6931472 

Tuning parameter 'size' was held constant at a value of 2 
Tuning parameter 'decay' was held constant at a value of 0.05 

几件事情要注意:

此损失函数仅适用于包含N/Y作为类的数据,因为概率定义为data$Y,更好的方法是找到类的名称并使用它。此外,其自log(0)截断概率值的良好做法并不是一个好主意:

LogLoss <- function (data, lev = NULL, model = NULL) 
    { 
    obs <- data[, "obs"] 
    cls <- levels(obs) #find class names 
    probs <- data[, cls[2]] #use second class name 
    probs <- pmax(pmin(as.numeric(probs), 1 - 1e-15), 1e-15) #bound probability 
    logPreds <- log(probs)   
    log1Preds <- log(1 - probs) 
    real <- (as.numeric(data$obs) - 1) 
    out <- c(mean(real * logPreds + (1 - real) * log1Preds)) * -1 
    names(out) <- c("LogLoss") 
    out 
    } 
+0

这是完美的!非常感谢你,我遇到了两个错误,所以感谢你注意到后续问题 – dleal

+0

欢迎你。检查编辑其他注意事项。 – missuse