gpt4 book ai didi

r - 如何对具有多个组的数据集对每个组进行 PCA?

转载 作者:行者123 更新时间:2023-12-04 23:21:12 24 4
gpt4 key购买 nike

我有来自四个群体、四个处理和三个重复的个体数据集。每个个体仅在一个群体、处理和重复组合中。我对每个人进行了四次测量。我想对每个种群、底物和重复组合的这些测量进行 PCA。

我知道如何对所有个体进行 PCA,我可以将数据集拆分为多个数据集,用于种群、基质和复制的每个组合,然后对每个新数据集执行 PCA。

如何在完整数据集上进行 PCA,以最有效地为每个种群、底物和复制组合获得单独的 PC1、PC2...结果?我想将数据集转换为列表,但不确定如何将 princomp 函数应用于列表。我在正确的轨道上吗?

样本数据:

TestData<- structure(list(Location = c("A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
"C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C",
"D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D"),
Substrate = c("A", "B", "C", "D", "A", "B", "C", "D", "A", "B", "C", "D",
"A", "B", "C", "D", "A", "B", "C", "D", "A", "B", "C", "D",
"A", "B", "C", "D", "A", "B", "C", "D", "A", "B", "C", "D",
"A", "B", "C", "D", "A", "B", "C", "D", "A", "B", "C", "D"),
Replicate = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L),
Adult_Weight = c(0.0092, 0.0083, 0.0088, 0.0077, 0.0088, 0.01,
0.0099, 0.011, 0.0078, 0.0086, 0.0071, 0.0093,
0.0111, 0.01, 0.0097, 0.0091, 0.0083, 0.0098,
0.0093, 0.009, 0.0114, 0.0087, 0.0094, 0.0096,
0.0099, 0.0105, 0.0091, 0.0115, 0.0106, 0.0104,
0.0113, 0.0115, 0.0107, 0.0126, 0.0106, 0.0101,
0.0095, 0.0113, 0.0111, 0.0118, 0.0114, 0.0123,
0.0119, 0.0103, 0.0119, 0.0116, 0.0112, 0.0114),
Adult_Thorax_Width = c(1.31, 1.31, 1.43, 1.45, 1.52, 1.43, 1.57, 1.45, 1.43, 1.54, 1.32, 1.49,
1.58, 1.36, 1.42, 1.45, 1.48, 1.38, 1.55, 1.46, 1.52, 1.42, 1.6, 1.49,
1.48, 1.58, 1.51, 1.53, 1.54, 1.76, 1.63, 1.62, 1.44, 1.51, 1.53, 1.58,
1.46, 1.94, 1.54, 2.09, 1.5, 1.65, 1.86, 1.54, 1.8, 1.98, 1.82, 1.63),
Adult_Wing_Length = c(1359L, 1377L, 1555L, 1559L, 1562L, 1578L, 1580L, 1588L, 1597L, 1598L, 1603L, 1605L,
1612L, 1614L, 1616L, 1617L, 1623L, 1628L, 1639L, 1642L, 1643L, 1649L, 1651L, 1652L,
1653L, 1653L, 1654L, 1656L, 1656L, 1656L, 1662L, 1664L, 1665L, 1668L, 1670L, 1670L,
1671L, 1672L, 1674L, 1682L, 1685L, 1687L, 1688L, 1694L, 1698L, 1698L, 1707L, 1708L),
Adult_Leg_Length = c(414L, 390L, 627L, 541L, 430L, 450L, 451L, 462L, 443L, 582L, 435L, 579L,
499L, 418L, 444L, 646L, 589L, 466L, 435L, 477L, 450L, 606L, 660L, 450L,
446L, 480L, 462L, 438L, 483L, 454L, 492L, 457L, 463L, 499L, 470L, 474L,
627L, 478L, 473L, 496L, 666L, 499L, 480L, 461L, 450L, 483L, 460L, 584L)),
.Names = c("Location", "Substrate", "Replicate", "Weight", "Thorax_Width", "Wing_Length", "Leg_Length"),
row.names = c(NA, 48L),
class = "data.frame")

最佳答案

如果我正确理解您的数据组成,您应该输入您的总体和处理作为因子变量,并将三个重复作为单独的行。列类型类似于:

  • 第 1 列总体:因子
  • 第二列处理:因子
  • 第 3 - 6 列测量值:数字(共 4 列)

  • 并且整个数据类应该最好是“data.frame”,因为在“data.frame”中,您的列可能具有不同的类类型(例如与“矩阵”不同)。

    这是一个示例,它根据阶乘变量对示例鸢尾花数据集进行分层,此处为 'iris$Species'。如果您有多个要对其进行分层的因素,您可以使用一个两(或更多)列的矩阵作为 INDICES 参数的输入。你确定你不是真的指的是带有注释的单个 PCA?这可以通过将因子类型变量更改为数字并在散点图中注释它们来轻松完成,例如通过'col'(=color)和'pch'(=symbol)参数。
    data(iris) # Load the example Iris-dataset
    class(iris)
    lapply(iris, FUN=class)
    #> class(iris)
    #[1] "data.frame"
    #>
    #> lapply(iris, FUN=class)
    #$Sepal.Length
    #[1] "numeric"
    #
    #$Sepal.Width
    #[1] "numeric"
    #
    #$Petal.Length
    #[1] "numeric"
    #
    #$Petal.Width
    #[1] "numeric"
    #
    #$Species
    #[1] "factor"

    par(mfrow=c(2,2), mar=c(4,4,2,1))
    # Separate PCA plot for each Species
    # Apply our defined PCA-function where each unique INDICES are handled as a separate function call
    by(iris, INDICES=iris$Species, FUN=function(z){
    # Use numeric fields for the PCA
    pca <- prcomp(z[,unlist(lapply(z, FUN=class))=="numeric"])
    plot(pca$x[,1:2], pch=16, main=z[1,"Species"]) # 2 first principal components
    z
    })

    # Color annotation
    # Use numeric fields for the PCA
    pca <- prcomp(iris[,unlist(lapply(iris, FUN=class))=="numeric"])
    plot(pca$x[,1:2], pch=16, col=as.numeric(iris[,"Species"]), main="Color annotation") # 2 first principal components
    legend("bottom", pch=16, col=unique(as.numeric(iris[,"Species"])), legend=unique(iris[,"Species"]))

    PCA example

    请注意,从左上角开始计数的前三个面板中的 PCA 轴不同。这是因为当仅计算分组 PCA 时,PCA 计算中的协方差矩阵并不相同。

    或者,如果您想要一个 PCA,但只是在它们自己的窗口中绘制属于不同类别的观察结果,您可以尝试以下几行:
    par(mfrow=c(1,3))
    # Compute the PCA
    pca <- prcomp(iris[,unlist(lapply(iris, FUN=class))=="numeric"])
    # Apply a plotting function over unique values of iris$Species, notice we always plot the same 'pca' object in all categories
    lapply(unique(iris$Species), FUN=function(z) {
    plot(pca$x[which(z==iris$Species),1:2], xlim=extendrange(pca$x[,1]), ylim=extendrange(pca$x[,2]),pch=16, main=z)
    })

    pca2

    编辑:

    在'by'-function的帮助文件中:
    “INDICES:一个因子或一个因子列表,每个长度为 nrow(data)。”

    因此,如果我们将列表中的索引提供给副函数,我们可以根据多个因子变量对数据进行分层。这是一个人工示例,其中“第一”和“第二”是对数据进行分层的两个同时发生的因素。扩展到三个(或更多)变量应该很简单:
    ex <- cbind(matrix(rnorm(400), ncol=4), first = c("A", "B"), second = c("foo", "bar", "asd", "fgh", "jkl"))
    by(ex, INDICES=list(ex[,"first"], ex[,"second"]), FUN=function(z) z)
    # Modify the above function provided in FUN to suit your needs

    关于r - 如何对具有多个组的数据集对每个组进行 PCA?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26297028/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com