2016-10-02 208 views
2

我从knn函数中得到了一系列建模类标签。我有一个带有基本数字训练数据的数据框,另一个数据框用于测试数据。我将如何着手为knn函数的返回值绘制决策边界?我必须在锁定的机器上复制我的发现,所以请尽可能限制使用第三方库。在R中绘制决策边界

我只有两个类标签,“橙色”和“蓝色”。他们用训练数据绘制在简单的2D图上。再次,我只想围绕knn函数的结果绘制一个边界。

代码:

library(class) 

n <- 100 

set.seed(1) 
x <- round(runif(n, 1, n)) 
set.seed(2) 
y <- round(runif(n, 1, n)) 
train.df <- data.frame(x, y) 

set.seed(1) 
x.test <- round(runif(n, 1, n)) 
set.seed(2) 
y.test <- round(runif(n, 1, n)) 
test.df <- data.frame(x.test, y.test) 

k <- knn(train.df, test.df, classes, k=25) 

plot(test.df, col=k) 

classes距离的代码先前的比特确定的类别标签的载体。

如果你需要它,下面是我的作品的完整代码:

library(class) 

n <- 100 
set.seed(1) 
x <- round(runif(n, 1, n)) 
set.seed(2) 
y <- round(runif(n, 1, n)) 

# ============================================================ 
# Bayes Classifier + Decision Boundary Code 
# ============================================================ 

classes <- "null" 
colours <- "null" 

for (i in 1:n) 
{ 

    # P(C = j | X = x, Y = y) = prob 
    # "The probability that the class (C) is orange (j) when X is some x, and Y is some y" 
    # Two predictors that influence classification: x, y 
    # If x and y are both under 50, there is a 90% chance of being orange (grouping) 
    # If x and y and both over 50, or if one of them is over 50, grouping is blue 
    # Algorithm favours whichever grouping has a higher chance of success, then plots using that colour 
    # When prob (from above) is 50%, the boundary is drawn 

    percentChance <- 0 
    if (x[i] < 50 && y[i] < 50) 
    { 
     # 95% chance of orange and 5% chance of blue 
     # Bayes Decision Boundary therefore assigns to orange when x < 50 and y < 50 
     # "colours" is the Decision Boundary grouping, not the plotted grouping 
     percentChance <- 95 
     colours[i] <- "orange" 
    } 
    else 
    { 
     percentChance <- 10 
     colours[i] <- "blue" 
    } 

    if (round(runif(1, 1, 100)) > percentChance) 
    { 
     classes[i] <- "blue" 
    } 
    else 
    { 
     classes[i] <- "orange" 
    } 
} 

boundary.x <- seq(0, 100, by=1) 
boundary.y <- 0 
for (i in 1:101) 
{ 
    if (i > 49) 
    { 
     boundary.y[i] <- -10 # just for the sake of visual consistency, real value is 0 
    } 
    else 
    { 
     boundary.y[i] <- 50 
    } 
} 
df <- data.frame(boundary.x, boundary.y) 

plot(x, y, col=classes) 
lines(df, type="l", lty=2, lwd=2, col="red") 

# ============================================================ 
# K-Nearest neighbour code 
# ============================================================ 

#library(class) 

#n <- 100 

#set.seed(1) 
#x <- round(runif(n, 1, n)) 
#set.seed(2) 
#y <- round(runif(n, 1, n)) 
train.df <- data.frame(x, y) 

set.seed(1) 
x.test <- round(runif(n, 1, n)) 
set.seed(2) 
y.test <- round(runif(n, 1, n)) 
test.df <- data.frame(x.test, y.test) 

k <- knn(train.df, test.df, classes, k=25) 

plot(test.df, col=k) 
+0

我认为它有助于当你添加了'classes'载体。这样我们就可以处理这个对象的内容。 –

+0

@and我编辑了原始文章以包含我的完整代码。再次,任何帮助表示赞赏。 – KingDan

+0

你在哈米兹班吗?瑞尔森? – BDillan

回答

2

获取在网格类的概率预测,并得出以P轮廓线= 0.5(或任何你想要的截止指向)。这也是Venables和Ripley的经典MASS教科书中使用的方法,以及Hastie,Tibshirani和Friedman的统计学习元素中使用的方法。

# class labels: simple distance from origin 
classes <- ifelse(x^2 + y^2 > 60^2, "blue", "orange") 
classes.test <- ifelse(x.test^2 + y.test^2 > 60^2, "blue", "orange") 

grid <- expand.grid(x=1:100, y=1:100) 
classes.grid <- knn(train.df, grid, classes, k=25, prob=TRUE) # note last argument 
prob.grid <- attr(classes.grid, "prob") 
prob.grid <- ifelse(classes.grid == "blue", prob.grid, 1 - prob.grid) 

# plot the boundary 
contour(x=1:100, y=1:100, z=matrix(prob.grid, nrow=100), levels=0.5, 
     col="grey", drawlabels=FALSE, lwd=2) 
# add points from test dataset 
points(test.df, col=classes.test) 

enter image description here

又见基本上the same question on CrossValidated.