gpt4 book ai didi

r - 从带有标签的 N 个向量快速计算共生矩阵

转载 作者:行者123 更新时间:2023-12-02 09:29:47 24 4
gpt4 key购买 nike

我有一个矩阵,在其 N 行(聚类算法的迭代)的每一行中都包含其 M 个点(列)所属的簇:

例如:

data <- t(rmultinom(50, size = 7, prob = rep(0.1,10)))

[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 0 0 2 1 1 0 2 1 0
[2,] 3 1 2 0 0 0 0 1 0 0
[3,] 0 1 2 1 0 0 0 0 2 1
[4,] 0 1 1 0 2 0 0 2 0 1
[5,] 3 0 0 0 2 1 0 0 0 1
[6,] 0 1 2 0 0 1 1 2 0 0
[7,] 0 1 0 1 0 1 1 2 1 0
[8,] 3 0 0 2 0 0 0 1 0 1
...

我想构建一个共现矩阵,其中位置 (i,j) 是两个点通过不同行在同一簇中出现的次数之和。

一个简单的方法是:

  coincidences <- matrix(0, nrow=10, ncol=10)
for (n in 1:50){
for (m in 1:10){
coincidences[m,] <- coincidences[m,] + as.numeric(data[n,m] == data[n,])
}
}

[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 50 17 21 22 15 14 16 20 18 18
[2,] 17 50 17 14 17 18 15 14 20 16
[3,] 21 17 50 20 21 16 16 13 16 20
[4,] 22 14 20 50 16 18 16 21 18 14
[5,] 15 17 21 16 50 18 16 17 11 17
[6,] 14 18 16 18 18 50 18 22 25 13
[7,] 16 15 16 16 16 18 50 14 20 22
[8,] 20 14 13 21 17 22 14 50 11 15
[9,] 18 20 16 18 11 25 20 11 50 18
[10,] 18 16 20 14 17 13 22 15 18 50

我怎样才能让它更快?

额外:如何使用ggplot2绘制它? (我在gplots中看到了heatmap.2,但我不知道这是否是一个过大的杀伤力)

最佳答案

Rcpp

使用 Rcpp 包在 R 中使用 C++ 实现可以尽可能快地完成工作

library(Rcpp)

data <- t(rmultinom(50, size = 7, prob = rep(0.1,10)))
coincidences <- matrix(0, nrow=10, ncol=10)

#R implementation
fR<-function(data,coincidences){
for (n in 1:50){
for (m in 1:10){

coincidences[m,] <- coincidences[m,] + as.numeric(data[n,m] == data[n,])

}

}
return(coincidences)
}


#C++ Implementation
cppFunction('NumericMatrix fC(NumericMatrix data, NumericMatrix coincidences ) {

int nrow = data.nrow(), ncol = coincidences.ncol();
NumericMatrix out(nrow, ncol);
int addon;


for (int n = 0; n < nrow; n++) {
for (int m = 0; m < ncol; m++) {
for (int p = 0; p < nrow; p++) {

if( data(n,m) == data(n,p) ){
addon = 1;
}else {
addon = 0;
}

coincidences(m,p) = coincidences(m,p) + addon;


}

}

}
return coincidences;
}')

#Call functions
coincidences <- matrix(0, nrow=10, ncol=10)
c1<-fC(data,coincidences)
coincidences <- matrix(0, nrow=10, ncol=10)
c2<-fR(data,coincidences)
all.equal(c1,c2)
> TRUE


library(microbenchmark)
microbenchmark(fC(data,coincidences),fR(data,coincidences))

> Unit: microseconds
expr min lq mean median uq max neval
fC(data, coincidences) 6.415 6.736 8.88454 7.698 8.660 74.727 100
fR(data, coincidences) 283.514 290.089 301.84637 293.456 309.973 388.388 100

编辑

绘制:

library(reshape2)
C<-fC(data,coincidences)
ggplot(melt(C), aes(Var1,Var2, fill=value)) + geom_raster()

关于r - 从带有标签的 N 个向量快速计算共生矩阵,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34343177/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com