r - 对 2 个距离矩阵求和以获得第三个 'overall' 距离矩阵(生态环境)-6ren

r - 对 2 个距离矩阵求和以获得第三个 'overall' 距离矩阵(生态环境)

转载作者：行者123 更新时间：2023-12-02 17:32:50

我是生态学家，主要使用 vegan R 包。

我有 2 个矩阵(样本 x 丰度)(参见下面的数据):

matrix 1/ nrow= 6replicates*24sites, ncol=15 species abundances (fish) matrix 2/ nrow= 3replicates*24sites, ncol=10 species abundances (invertebrates)

两个矩阵中的位点相同。我想获得站点对之间的总体布雷柯蒂斯差异(考虑两个矩阵)。我看到 2 个选项:

选项 1，对重复(在站点规模)鱼类和大型无脊椎动物丰度进行平均，c 绑定(bind)两个平均丰度矩阵(nrow=24 个站点，ncol=15+10 平均丰度)并计算 bray-curtis。

选项2，对于每个组合，计算站点对之间的布雷柯蒂斯相异性，计算站点质心之间的距离。然后对2个距离矩阵求和。

如果我不清楚，我在下面的 R 代码中执行了这两个操作。

请告诉我选项 2 是否正确并且是否比选项 1 更合适。

提前谢谢您。

皮埃尔

下面是 R 代码示例

生成数据
library(plyr);library(vegan)

#assemblage 1: 15 fish species, 6 replicates per site
a1.env=data.frame(
  Habitat=paste("H",gl(2,12*6),sep=""),
  Site=paste("S",gl(24,6),sep=""),
  Replicate=rep(paste("R",1:6,sep=""),24))

summary(a1.env)

a1.bio=as.data.frame(replicate(15,rpois(144,sample(1:10,1))))

names(a1.bio)=paste("F",1:15,sep="")

a1.bio[1:72,]=2*a1.bio[1:72,]

#assemblage 2: 10 taxa of macro-invertebrates, 3 replicates per site

a2.env=a1.env[a1.env$Replicate%in%c("R1","R2","R3"),]

summary(a2.env)

a2.bio=as.data.frame(replicate(10,rpois(72,sample(10:100,1))))

names(a2.bio)=paste("I",1:10,sep="")

a2.bio[1:36,]=0.5*a2.bio[1:36,]


#environmental data at the sit scale

env=unique(a1.env[,c("Habitat","Site")])

env=env[order(env$Site),]
选项 1，平均丰度和 cbind
a1.bio.mean=ddply(cbind(a1.bio,a1.env),.(Habitat,Site),numcolwise(mean))

a1.bio.mean=a1.bio.mean[order(a1.bio.mean$Site),]

a2.bio.mean=ddply(cbind(a2.bio,a2.env),.(Habitat,Site),numcolwise(mean))

a2.bio.mean=a2.bio.mean[order(a2.bio.mean$Site),]

bio.mean=cbind(a1.bio.mean[,-c(1:2)],a2.bio.mean[,-c(1:2)])

dist.mean=vegdist(sqrt(bio.mean),"bray")
选项 2，计算质心之间的每个组合距离并对 2 个距离矩阵求和
a1.dist=vegdist(sqrt(a1.bio),"bray")

a1.coord.centroid=betadisper(a1.dist,a1.env$Site)$centroids

a1.dist.centroid=vegdist(a1.coord.centroid,"eucl")

a2.dist=vegdist(sqrt(a2.bio),"bray")

a2.coord.centroid=betadisper(a2.dist,a2.env$Site)$centroids

a2.dist.centroid=vegdist(a2.coord.centroid,"eucl")
summing up the two distance matrices using Gavin Simpson 's fuse()
dist.centroid=fuse(a1.dist.centroid,a2.dist.centroid,weights=c(15/25,10/25))
summing up the two euclidean distance matrices (thanks to Jari Oksanen correction)
dist.centroid=sqrt(a1.dist.centroid^2 + a2.dist.centroid^2)
和下面的“coord.centroid”用于进一步基于距离的分析(正确吗？)
coord.centroid=cmdscale(dist.centroid,k=23,add=TRUE)
比较选项 1 和 2
pco.mean=cmdscale(vegdist(sqrt(bio.mean),"bray"))

pco.centroid=cmdscale(dist.centroid)

comparison=procrustes(pco.centroid,pco.mean)

protest(pco.centroid,pco.mean)

最佳答案

更简单的解决方案是通过对每个矩阵进行加权来灵活组合两个相异矩阵。权重之和需要为 1。对于两个相异矩阵，融合相异矩阵为

d.fused = (w * d.x) + ((1 - w) * d.y)

其中 w 是数值标量(长度为 1 的向量)权重。如果您没有理由对其中一组差异的权重高于另一组，只需使用 w = 0.5。

我的模拟包中有一个函数可以为你做到这一点； fuse ()。 ?fuse 中的示例是

 train1 <- data.frame(matrix(abs(runif(100)), ncol = 10))
 train2 <- data.frame(matrix(sample(c(0,1), 100, replace = TRUE),
                      ncol = 10))
 rownames(train1) <- rownames(train2) <- LETTERS[1:10]
 colnames(train1) <- colnames(train2) <- as.character(1:10)

 d1 <- vegdist(train1, method = "bray")
 d2 <- vegdist(train2, method = "jaccard")

 dd <- fuse(d1, d2, weights = c(0.6, 0.4))
 dd
 str(dd)

这个想法用于监督 Kohonen 网络(监督 SOM)，将多层数据纳入单个分析中。

analog 与 vegan 密切合作，因此并行运行两个软件包不会出现任何问题。

关于r - 对 2 个距离矩阵求和以获得第三个 'overall' 距离矩阵(生态环境)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/21332959/