0
我正在测量跨PCA空间和'特征空间'的〜20种治疗和3组的质心。如果我正确理解我的数学老师,他们之间的距离应该是相同的。然而,按照我计算的方式,他们不是,我想知道如果我做数学的方式,他们中的任何一个都是错误的。PCA空间和“特征空间”分歧中的质心距离计算
我使用的是臭名昭著的葡萄酒的数据集作为说明我的方法/ MWE:
library(ggbiplot)
data(wine)
treatments <- 1:2 #treatments to be considerd for this calculation
wine.pca <- prcomp(wine[treatments], scale. = TRUE)
#calculate the centroids for the feature/treatment space and the pca space
df.wine.x <- as.data.frame(wine.pca$x)
df.wine.x$groups <- wine.class
wine$groups <- wine.class
feature.centroids <- aggregate(wine[treatments], list(Type = wine$groups), mean)
pca.centroids <- aggregate(df.wine.x[treatments], list(Type = df.wine.x$groups), mean)
pca.centroids
feature.centroids
#calculate distance between the centroids of barolo and grignolino
dist(rbind(feature.centroids[feature.centroids$Type == "barolo",][-1],feature.centroids[feature.centroids$Type == "grignolino",][-1]), method = "euclidean")
dist(rbind(pca.centroids[pca.centroids$Type == "barolo",][-1],pca.centroids[pca.centroids$Type == "grignolino",][-1]), method = "euclidean")
的最后两行中的PCA空间内的功能空间和1.80717
的距离返回1.468087
,表明有美中不足...