gpt4 book ai didi

Fast way to permute rows of matrices in Rcpp or RcppArmadillo?(在RCPP或RcppArmadillo中快速置换矩阵行的方法?)

转载 作者:bug小助手 更新时间:2023-10-24 21:59:15 28 4
gpt4 key购买 nike



I'm running a stationary bootstrap algorithm on an N x M matrix, X, where both N and M are on the order of 1500 to 3000.

我在N x M矩阵X上运行静态引导算法,其中N和M都在1500到3000的数量级上。


The bootstrap matrix of index permutations, Y, is N x B, where B is, say, 10,000.

索引排列的引导矩阵Y是N×B,其中B是,比方说,10,000。


In R syntax, the goal is to compute:

在R语法中,目标是计算:


sapply(1:B, function(b) colSums(X[Y[,b],]))

That is, we rearrange the rows of X (with possible duplications) and take the column sums--10,000 times.

也就是说,我们重新排列X的行(可能有重复),并计算列总和--10,000次。


The above code takes about 3 minutes with N = 1500, M = 2000, B = 10000.

当N=1500,M=2000,B=10000时,上述代码大约需要3分钟。


Converting the code to Rcpp reduces it to about 25 seconds:

将代码转换为RCPP将时间缩短到大约25秒:


// [[Rcpp::export]]
NumericVector get_bootstrap_statistic_mean(NumericMatrix x, IntegerMatrix theta){

int nr = x.nrow();
int nc = x.ncol();
int nb = theta.ncol();
NumericMatrix res(nb, nc);


for(int j = 0; j < nc; j++){

NumericMatrix::Column x_col_j = x.column(j);
NumericMatrix::Column res_col_j = res.column(j);

for(int b = 0; b < nb; b++){

IntegerMatrix::Column theta_col_b = theta.column(b);

double sum = 0.0;
for(int i = 0; i < nr; i++){

sum += x_col_j[theta_col_b[i]-1]; //Have to subtract one to map the R indices (start at 1) to the C++ indices (start at 0)

}
res_col_j[b] = sum / nr;
}
}
return res;
}


But this post shows a faster way to get column sums than the nested loops above.

但这篇文章展示了一种比上面的嵌套循环更快地获得列总和的方法。


Thus, is there a way to create the permuted matrices in C++ (Rcpp, RcppArmadillo) that is faster than doing :

因此,有没有一种在C++(Rcpp,RcppArmadillo)中创建置换矩阵的方法比执行以下操作更快:


sapply(1:B, function(b) Arma_colSums(X[Y[,b],])

SApply(1:b,函数(B)arma_colSums(X[Y[,b],])


which takes about 20 seconds for N = 1500, M = 2000, B = 10000 (where Arma_colSums was (more or less) the fastest method from the linked post), so that we can apply Arma_colSums to the permuted matrices?

对于N=1500,M=2000,B=10000(其中Arma_colSums(或多或少)是链接帖子中最快的方法),这大约需要20秒,这样我们就可以将Arma_colSums应用于置换矩阵?


I've looked at the RcppArmadillo subsetting documents and most of it seems like it's fetching row or column vectors rather than rearranging the entire matrix. And the "sub2ind" functionality seems not applicable or if it is applicable, that it would take longer to put the permutation matrix in the required form than using one of the faster approaches above.

我看过RcppArmadillo的子集文档,其中大部分似乎是在获取行或列向量,而不是重新排列整个矩阵。并且“sub2ind”功能似乎不适用,或者如果它适用,则将置换矩阵放入所需形式所需的时间将比使用上述更快的方法之一所需的时间更长。


I've also looked at the bootstrap example in the RcppArmadillo introduction, but it uses IID bootstrap on a single column of X (whereas X here has thousands of columns).

我还研究了RcppArmadillo简介中的引导示例,但它在X的单个列上使用IID引导(而这里的X有数千列)。


I tried chatGPT, but it wasn't able to provide any compilable code.

我尝试了chat GPT,但它无法提供任何可编译代码。


更多回答

Welcome to StackOverflow, and (Rcpp)Armadillo! The fundamental issue here is that these vectors are stored as columns by R (and then 'zero-copy' provided to Armadillo). That means that row-wise access "cuts across the grain". So no "obvious" shortcut I can offer.

欢迎来到StackOverflow和(RCPP)Armadillo!这里的基本问题是,这些向量以列的形式由R存储(然后以零复制方式提供给Armadillo)。这意味着逐行访问“贯穿了方方面面”。所以我不能提供任何“显而易见”的捷径。

I short have gleaned as much! Ugh, well, you're the authority @DirkEddelbuettel, so at least I know not to waste any more time on it and be grateful for the existing improvements. Thank you!

我已经收集了同样多的东西!呃,好吧,你是权威@DirkEddelbuettel,所以至少我知道不要在上面浪费更多的时间,并对现有的改进表示感激。谢谢!

Also, can always parallelize it for perhaps another nontrivial increase in speed.

此外,还可以始终将其并行化,以获得另一次不平凡的速度提升。

That was my thought too, "devils is as always in the details".

这也是我的想法,“魔鬼总是存在于细节中”。

优秀答案推荐

Use matrix multiplication instead:

改用矩阵乘法:


library(Rfast) # for `colTabulate` and `Crossprod`

system.time(Z1 <- get_bootstrap_statistic_mean(X, Y))
#> user system elapsed
#> 28.17 0.01 28.19
system.time(Z2 <- Crossprod(X, colTabulate(Y, N)))
#> user system elapsed
#> 26.30 1.36 3.81

all.equal(Z1, t(Z2)/N)
#> [1] TRUE

Data:

数据:


N <- 15e2L
M <- 2e3L
B <- 1e4L

X <- matrix(runif(N*M), N, M)
Y <- matrix(sample(N, N*B, 1), N, B)

更多回答

That is brilliant, @jblood94! Thank you.

“这真是太棒了,”杰罗姆94!谢谢。

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com