2016-08-11 76 views
-1

有谁知道为什么下面的KNN R代码对不同的种子给出不同的预测? 由于K < -5这很奇怪,因此大多数都是明确定义的。另外,在数据问题的精确度下,浮点数不会太小。 (注:我知道测试是从训练古怪不同这只是创建证明奇怪KNN行为的合成例子)问:KN中的R - 奇怪的行为

library(class) 

train <- rbind(
    c(0.0626015, 0.0530052, 0.0530052, 0.0496676, 0.0530052, 0.0626015), 
    c(0.0565861, 0.0569546, 0.0569546, 0.0511377, 0.0569546, 0.0565861), 
    c(0.0538332, 0.057786, 0.057786, 0.0506127, 0.057786, 0.0538332), 
    c(0.059033, 0.0541484, 0.0541484, 0.0501926, 0.0541484, 0.059033), 
    c(0.0587272, 0.0540445, 0.0540445, 0.0505076, 0.0540445, 0.0587272), 
    c(0.0578095, 0.0564349, 0.0564349, 0.0505076, 0.0564349, 0.0578095) 
) 
trainLabels <- c(1, 
       1, 
       0, 
       0, 
       1, 
       0) 
test <- c(0.1923241, 0.1734074, 0.1734074, 0.1647619, 0.1734074, 0.1923241) 

K <- 5 

set.seed(494139) 
pred <- knn(train=train, test=test, cl = trainLabels, k=K) 
message("predicted: ", pred, ", seed: ", seed) 
# **predicted: 1**, seed: 494139 

set.seed(5371) 
pred <- knn(train=train, test=test, cl = trainLabels, k=K) 
message("predicted: ", pred, ", seed: ", seed) 
# **predicted: 0**, seed: 5371 
+3

你的问题到底是什么? R代码中有一个错误:最后一个测试假设使用与第二个相同的种子,但它并不是因为它没有设置。这是你的困惑的根源吗? – AlexR

回答

0

knn函数调用底层C function(线122)称为VR_knn,其中包括一个引入“模糊”或小值(epsilon,EPS)的步骤。看起来您的示例参数值可能会针对该“模糊”步骤。这方面的证据是,将您的值四舍五入得到一致性。因此:

rm(list=ls()) 

library(class) 
train <- rbind(
    c(0.0626015, 0.0530052, 0.0530052, 0.0496676, 0.0530052, 0.0626015), 
    c(0.0565861, 0.0569546, 0.0569546, 0.0511377, 0.0569546, 0.0565861), 
    c(0.0538332, 0.057786, 0.057786, 0.0506127, 0.057786, 0.0538332), 
    c(0.059033, 0.0541484, 0.0541484, 0.0501926, 0.0541484, 0.059033), 
    c(0.0587272, 0.0540445, 0.0540445, 0.0505076, 0.0540445, 0.0587272), 
    c(0.0578095, 0.0564349, 0.0564349, 0.0505076, 0.0564349, 0.0578095) 
) 
trainLabels <- c(1,1,0,0,1,0) 
test <- c(0.1923241, 0.1734074, 0.1734074, 0.1647619, 0.1734074, 0.1923241) 
K <- 5 

train <- round(train,4) 

seed <- 494139 
set.seed(seed) 
pred <- knn(train=train, test=test, cl = trainLabels, k=K) 
message("predicted: ", pred, ", seed: ", seed) 
# predicted: 0, seed: 494139 

seed <- 5371 
set.seed(seed) 
pred <- knn(train=train, test=test, cl = trainLabels, k=K) 
message("predicted: ", pred, ", seed: ", seed) 
# predicted: 0, seed: 5371