r - R中的快速并行二分距离计算-6ren

r - R中的快速并行二分距离计算

转载作者：行者123 更新时间：2023-12-03 09:26:23

具有并行化 Rcpp 后端的 R 中二分距离的最快计算是什么？parallelDist是一个带有 cpp 后端并支持多线程的很棒的包，但不支持二分距离计算(据我所知)。
使用 parallelDist()用于二分距离矩阵计算。 除了 m1:m2 之外，这还涉及计算 m1:m1 和 m2:m2——效率非常低。

library(parallelDist)

bipartiteDist <- function(matrix1,matrix2){
  matrix12 <- rbind(matrix1,matrix2)
  d <- parallelDist(matrix12)
  d <- as.matrix(d)[(1:nrow(matrix1)),((nrow(matrix1)+1):(nrow(matrix1)*2))]
  d
}

matrix1 <- abs(matrix(rnorm(1000),10,100000))
matrix2 <- abs(matrix(rnorm(1000),10,100000))

dist <- bipartiteDist(matrix1, matrix2)

当超过 3 个内核可用时，此方法比 pDist 或纯 R 实现更快。 pdist非常适合计算二分距离，但不支持多线程。
并行化二分距离计算的任何快速实现？

最佳答案

wordspace dist.matrix() 函数支持并行计算二部距离。
基准测试 wordspace反对 parallelDist

matrix1 <- abs(matrix(rnorm(1000),100,100000))
matrix2 <- abs(matrix(rnorm(1000),100,100000))

library(rbenchmark)
library(parallelDist)
library(wordspace)

bipartiteDist_parallelDist <- function(matrix1,matrix2){
  matrix12 <- rbind(matrix1,matrix2)
  d <- parallelDist(matrix12, method = "euclidean")
  d <- as.matrix(d)[(1:nrow(matrix1)),((nrow(matrix1)+1):(nrow(matrix1)*2))]
  d
}

bipartiteDist_wordspace <- function(matrix1,matrix2){
  wordspace.openmp(threads = wordspace.openmp()$max)
  dist.matrix(matrix1,matrix2, byrow = TRUE, method = "euclidean", convert = FALSE)
}

benchmark("parallelDist" = {
            bd1 <- bipartiteDist_parallelDist(matrix1,matrix2)
          },
          "wordspace" = {
            bd2 <- bipartiteDist_wordspace(matrix1,matrix2)
          },
          replications = 1,
          columns = c("test", "replications", "elapsed",
                      "relative", "user.self", "sys.self"))

plot(bd1,bd2) # yes, both methods give near-identical results

基准测试结果:

          test replications elapsed relative user.self sys.self
1 parallelDist            1   2.120   12.184   126.145    0.523
2    wordspace            1   0.174    1.000     3.749    0.252

我用了 80 个线程。
进一步提高速度的框架 wordspace作者承认强调低内存负载而不是速度，因此额外的速度增益是可能的( source)。
例如，这里是欧几里得距离的一般框架:

bipartiteDist3 <- function(matrix1,matrix2){
  m1tm2 <- tcrossprod(matrix1,matrix2)
  sq1 <- rowSums(matrix1^2)
  sq2 <- rowSums(matrix2^2)
  out0 <- outer(sq1, sq2, "+") - 2 * m1tm2
  sqrt(out0)
}

我对针对稀疏矩阵优化的并行化解决方案非常感兴趣。据我所知， wordspace不针对稀疏性进行优化。例如，存在 tcrossprod、rowSums 和外部函数等价物的可并行化稀疏矩阵实现。

关于r - R中的快速并行二分距离计算，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/64464885/

文章推荐： python - 线性规划 - 最大值优化

文章推荐： postgresql - 有没有办法让 Postgres 提交超时？

文章推荐： java - 当 Predicate 而不是 Predicate

java - 二分/顺序搜索
我正在尝试编写一个程序，在名为 items 的数组中进行顺序搜索和二分搜索，该数组具有 10000 个已排序的随机 int 值。第二个名为 targets 的数组加载了 1000 个 int 值(50
algorithm - 图算法判断图是否连通、二分、有环且是树
当我尝试使用图表并为其编写一些代码但没有成功时，我遇到了一个问题:/!! 我想创建一些东西来获取图形数据并检查它是否:1- 连接2-二分法3-有循环4-是一棵树所以我想知道，例如，是否可以将其写入以

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

r - R中的快速并行二分距离计算