2016-11-29 153 views
0

我已经使用svm的“rfe”函数来创建具有减少特征的模型。然后,我对测试数据使用“预测”,输出类别标签(二进制),0类概率,1类概率。然后我用预测函数试图在ROCR包,预测的概率和真实的类标签,但碰到下面的错误,我不知道为什么为2个数组的长度相等:预测错误 - ROCR包(使用概率)

> pred_svm <- prediction(pred_svm_2class[,2], as.numeric(as.character(y))) 
Error in prediction(pred_svm_2class[, 2], as.numeric(as.character(y))) : 
Number of predictions in each run must be equal to the number of labels for each run. 

我有下面的代码输入是click me。它是一个二进制分类的小数据集,所以代码运行速度很快。

library("caret") 
library("ROCR") 
sensor6data_2class <- read.csv("/home/sensei/clustering/svm_2labels.csv") 
sensor6data_2class <- within(sensor6data_2class, Class <- as.factor(Class)) 

set.seed("1298356") 
inTrain_svm_2class <- createDataPartition(y = sensor6data_2class$Class, p = .75, list = FALSE) 
training_svm_2class <- sensor6data_2class[inTrain_svm_2class,] 
testing_svm_2class <- sensor6data_2class[-inTrain_svm_2class,] 
trainX <- training_svm_2class[,1:20] 
y <- training_svm_2class[,21] 

ctrl_svm_2class <- rfeControl(functions = rfFuncs , method = "repeatedcv", number = 5, repeats = 2, allowParallel = TRUE) 
model_train_svm_2class <- rfe(x = trainX, y = y, data = training_svm_2class, sizes = c(1:20), metric = "Accuracy", rfeControl = ctrl_svm_2class, method="svmRadial") 

pred_svm_2class = predict(model_train_svm_2class, newdata=testing_svm_2class) 
pred_svm <- prediction(pred_svm_2class[,2], y) 

感谢并感谢您的帮助。

回答

2

这是因为在该行

pred_svm <- prediction(pred_svm_2class[,2], y) 

pred_svm_2class [2]是测试数据的预测和y是用于训练数据的标签。刚才生成的测试标签在一个单独的变量这样

y_test <- testing_svm_2class[,21] 

现在,如果你这样做

pred_svm <- prediction(pred_svm_2class[,2], y_test) 

不会有任何错误。下面的完整代码 -

# install.packages("caret") 
# install.packages("ROCR") 
# install.packages("e1071") 
# install.packages("randomForest") 
library("caret") 
library("ROCR") 
sensor6data_2class <- read.csv("svm_2labels.csv") 
sensor6data_2class <- within(sensor6data_2class, Class <- as.factor(Class)) 

set.seed("1298356") 
inTrain_svm_2class <- createDataPartition(y = sensor6data_2class$Class, p = .75, list = FALSE) 
training_svm_2class <- sensor6data_2class[inTrain_svm_2class,] 
testing_svm_2class <- sensor6data_2class[-inTrain_svm_2class,] 
trainX <- training_svm_2class[,1:20] 
y <- training_svm_2class[,21] 
y_test <- testing_svm_2class[,21] 

ctrl_svm_2class <- rfeControl(functions = rfFuncs , method = "repeatedcv", number = 5, repeats = 2, allowParallel = TRUE) 
model_train_svm_2class <- rfe(x = trainX, y = y, data = training_svm_2class, sizes = c(1:20), metric = "Accuracy", rfeControl = ctrl_svm_2class, method="svmRadial") 

pred_svm_2class = predict(model_train_svm_2class, newdata=testing_svm_2class) 
pred_svm <- prediction(pred_svm_2class[,2], y_test) 
+0

这对我来说非常愚蠢,但却是一个真正的错误。谢谢你的帮助! – tacqy2