gpt4 book ai didi

r - PCA空间和 'feature-space'发散中的质心距离计算

转载 作者:行者123 更新时间:2023-11-30 09:53:10 24 4
gpt4 key购买 nike

我正在测量 PCA 空间和跨越约 20 个处理和 3 个组的“特征空间”的质心。如果我正确理解我的数学老师的话,他们之间的距离应该是相同的。然而,以我计算它们的方式来看,它们不是,我想知道我的计算方式是否是错误的。

我使用臭名昭著的 Wine 数据集作为我的方法/MWE 的说明:

library(ggbiplot)
data(wine)
treatments <- 1:2 #treatments to be considerd for this calculation
wine.pca <- prcomp(wine[treatments], scale. = TRUE)
#calculate the centroids for the feature/treatment space and the pca space
df.wine.x <- as.data.frame(wine.pca$x)
df.wine.x$groups <- wine.class
wine$groups <- wine.class
feature.centroids <- aggregate(wine[treatments], list(Type = wine$groups), mean)
pca.centroids <- aggregate(df.wine.x[treatments], list(Type = df.wine.x$groups), mean)
pca.centroids
feature.centroids
#calculate distance between the centroids of barolo and grignolino
dist(rbind(feature.centroids[feature.centroids$Type == "barolo",][-1],feature.centroids[feature.centroids$Type == "grignolino",][-1]), method = "euclidean")
dist(rbind(pca.centroids[pca.centroids$Type == "barolo",][-1],pca.centroids[pca.centroids$Type == "grignolino",][-1]), method = "euclidean")

最后两行返回特征空间中的距离 1.468087 和 pca 空间中的距离 1.80717,这表明美中不足...

最佳答案

这是由于缩放和居中的原因,如果不进行缩放和居中,原始特征空间和PCA特征空间中的距离将完全相同。

wine.pca <- prcomp(wine[treatments], scale=FALSE, center=FALSE)

dist(rbind(feature.centroids[feature.centroids$Type == "barolo",][-1],feature.centroids[feature.centroids$Type == "grignolino",][-1]), method = "euclidean")
# 1
# 2 1.468087
dist(rbind(pca.centroids[pca.centroids$Type == "barolo",][-1],pca.centroids[pca.centroids$Type == "grignolino",][-1]), method = "euclidean")
# 1
# 2 1.468087

获得相同结果的另一种方法是缩放/居中原始数据,然后应用带有缩放/居中的 PCA,如下所示:

wine[treatments] <- scale(wine[treatments], center = TRUE)
wine.pca <- prcomp(wine[treatments], scale = TRUE)

dist(rbind(feature.centroids[feature.centroids$Type == "barolo",][-1],feature.centroids[feature.centroids$Type == "grignolino",][-1]), method = "euclidean")
# 1
# 2 1.80717
dist(rbind(pca.centroids[pca.centroids$Type == "barolo",][-1],pca.centroids[pca.centroids$Type == "grignolino",][-1]), method = "euclidean")
# 1
# 2 1.80717

关于r - PCA空间和 'feature-space'发散中的质心距离计算,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41002095/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com