gpt4 book ai didi

r - 通过行组合索引矩阵时避免应用

转载 作者:行者123 更新时间:2023-12-03 16:12:49 25 4
gpt4 key购买 nike

我有以下格式的两个输入:

domains = list(
O60925 = "PF01920",
P01130 = c("PF07645", "PF00057", "PF00058"),
Q14764 = c("PF11978", "PF01505"),
Q9BX68 = "PF01230",
P46777 = "PF14204")

interactions = structure(c(1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0,
0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0), .Dim = c(8L, 8L), .Dimnames = list(c("PF01920",
"PF07645", "PF00057", "PF00058", "PF11978", "PF01505", "PF01230",
"PF14204"), c("PF01920", "PF07645", "PF00057", "PF00058", "PF11978",
"PF01505", "PF01230", "PF14204")))

PF01920 PF07645 PF00057 PF00058 PF11978 PF01505 PF01230 PF14204
PF01920 1 0 0 0 0 0 1 0
PF07645 0 1 0 1 0 0 0 0
PF00057 0 0 1 1 0 0 0 0
PF00058 0 1 1 1 0 0 0 0
PF11978 0 0 0 0 1 0 0 0
PF01505 0 0 0 0 0 1 0 0
PF01230 1 0 0 0 0 0 1 0
PF14204 0 0 0 0 0 0 0 0

我想计算以下输出,其中每个单元格中的整数表示 interactions 矩阵中 domains 列表中每对名称的所有单元格的总和。
       O60925 P01130 Q14764 Q9BX68 P46777
O60925 1 0 0 1 0
P01130 0 7 0 0 0
Q14764 0 0 2 0 0
Q9BX68 1 0 0 1 0
P46777 0 0 0 0 0

上下文是我有一个蛋白质列表( domains 列表的名称)和它们的 Pfam 域( domains 列表中的条目),以及一个已知 Pfam 域-Pfam 域相互作用的矩阵( interactions 矩阵)。我想总结每个蛋白质对的已知域-域相互作用的总数。

实际上 domains 列表和 interactions 矩阵都比这些大得多,因此我想确定一种生成此结果矩阵的快速方法。但是,到目前为止我能想出的唯一解决方案是 apply 循环:
proteins = names(domains)
result = matrix(0, nrow = length(proteins), ncol = length(proteins),
dimnames = list(proteins, proteins))
combinations = tidyr::crossing(proteins, proteins)
n_interactions = apply(combinations, 1, function(row) {
domains1 = domains[[row[1]]]
domains2 = domains[[row[2]]]
sum(interactions[as.matrix(crossing(domains1, domains2))])
})
result[as.matrix(combinations)] = n_interactions

我确信必须有一种更快的方法来做到这一点,但是如何呢?

最佳答案

你可以这样做:

n <- length(domains)
res <- matrix(nrow = n, ncol = n)
res[] <- purrr::pmap_dbl(expand.grid(domains, domains),
~ sum(interactions[.x, .y]))
colnames(res) <- rownames(res) <- names(domains)

实际上,这与您所做的没有太大区别。

基准:
microbenchmark::microbenchmark(
OP = {
proteins = names(domains)
result = matrix(0, nrow = length(proteins), ncol = length(proteins),
dimnames = list(proteins, proteins))
combinations = tidyr::crossing(proteins, proteins)
n_interactions = apply(combinations, 1, function(row) {
domains1 = domains[[row[1]]]
domains2 = domains[[row[2]]]
sum(interactions[as.matrix(crossing(domains1, domains2))])
})
result[as.matrix(combinations)] = n_interactions
},
privefl = {
n <- length(domains)
res <- matrix(nrow = n, ncol = n)
res[] <- purrr::pmap_dbl(expand.grid(domains, domains),
~ sum(interactions[.x, .y]))
colnames(res) <- rownames(res) <- names(domains)
},
times = 10
)

结果:
Unit: microseconds
expr min lq mean median uq max neval
OP 208685.225 209913.891 231506.172 210817.264 213071.475 416724.50 10
privefl 262.885 281.426 1580.779 306.092 396.975 12842.56 10

关于r - 通过行组合索引矩阵时避免应用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47208951/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com