2016-10-17 184 views
1

我创建了一个文本分类器使用进行分类注释成各种类别,如KFold交叉验证的R中KNN文本分类

 Comment       Category 
Good Service provided     Service 
Excellent Communication     Communication 

我已经做了分类:

knn(modeldata[train, ], modeldata[test,] , cl[train], k =2, use.all = TRUE) 

现在我想使用K-Fold Cross Validation评估此模型。我期待一些,我可以用它来知道如果模型过拟合或欠拟合等

我用

knn.cv(modeldata[train, ], cl[train], k =2, use.all = TRUE) 

但这个命令的帮助,表示将返回NA如果模型是困惑。请指导

回答

1

您为knn使用哪个软件包?您可以使用插入符,对于CV如下内容(例如与虹膜数据集):

training <- iris 
ctrl <- trainControl(method="repeatedcv",repeats = 3) 
knnFit <- train(Species ~ ., data = training, method = "knn", 
       trControl = ctrl, preProcess = c("center","scale")) 
knnFit 

与输出

k-Nearest Neighbors 

150 samples 
    4 predictor 
    3 classes: 'setosa', 'versicolor', 'virginica' 

Pre-processing: centered (4), scaled (4) 
Resampling: Cross-Validated (10 fold, repeated 3 times) 
Summary of sample sizes: 135, 135, 135, 135, 135, 135, ... 
Resampling results across tuning parameters: 

    k Accuracy Kappa  
    5 0.9511111 0.9266667 
    7 0.9577778 0.9366667 
    9 0.9533333 0.9300000 

Accuracy was used to select the optimal model using the largest value. 
The final value used for the model was k = 7. 
+0

我使用“类”包KNN。我不确定是否因为虹膜数据集中的列数,上述代码适用于虹膜数据集,但不适用于我的数据集(只有两列)。当我运行上面的命令时,我得到下面的消息:在preProcess.default中的警告(thresh = 0.95,k = 5,method = c(“center”,: 这些变量有零差异: – Sourabh

+0

我也试过下面的语句,错误消息:。knnFit1 < - train(Category_Text,data = x, method =“knn”, preProcess = NULL, trControl = trainControl(method =“cv”,number = 5,。classProbs = FALSE)) 。错误消息:结果中的一个或多个因素级别没有数据。查看所有因素,但未找到任何空白/空白级别 – Sourabh