R- findCorrelation（）（尖封装）设置时精确=真

按照findCorrelation() document我运行官方实施例的细节混淆如下所示：R- findCorrelation（）（尖封装）设置时精确=真

代码：

library(caret) 

R1 <- structure(c(1, 0.86, 0.56, 0.32, 0.85, 0.86, 1, 0.01, 0.74, 0.32, 
        0.56, 0.01, 1, 0.65, 0.91, 0.32, 0.74, 0.65, 1, 0.36, 
        0.85, 0.32, 0.91, 0.36, 1), 
       .Dim = c(5L, 5L)) 


colnames(R1) <- rownames(R1) <- paste0("x", 1:ncol(R1)) 

findCorrelation(R1, cutoff = .6, exact = TRUE, names = TRUE 
       ,verbose = TRUE)

结果：

> findCorrelation(R1, cutoff = .6, exact = TRUE, names = TRUE, verbose = TRUE) 
## Compare row 1 and column 5 with corr 0.85 
## Means: 0.648 vs 0.545 so flagging column 1 
## Compare row 5 and column 3 with corr 0.91 
## Means: 0.53 vs 0.49 so flagging column 5 
## Compare row 3 and column 4 with corr 0.65 
## Means: 0.33 vs 0.352 so flagging column 4 
## All correlations <= 0.6 
## [1] "x1" "x5" "x4"

我不知道计算过程如何工作，我。即为什么首先比较row 1和column 5，以及如何计算平均值，即使在我阅读the source file后。

我希望有人能够在我的例子的帮助下解释算法。

来源

2017-11-17 Jack

首先，它确定每个变量的平均绝对相关性。列x1和x5的平均值最高（分别为mean(c(0.85, 0.56, 0.32, 0.86))和mean(c(0.85, 0.9, 0.36, 0.32))），所以它看起来在第一步中删除了其中的一个。它发现x1是全球最具攻击性的，因此将其删除。

之后，它使用相同的过程重新计算并比较x5和。

由于所有成对相关性均低于您的阈值，因此在删除三列后停止。

来源

2017-11-20 14:36:48 topepo

R- findCorrelation（）（尖封装）设置时精确=真

回答

相关问题