gpt4 book ai didi

r - 从 R 中的多个 lm() 中快速检索 pvalues

转载 作者:行者123 更新时间:2023-12-04 17:17:19 24 4
gpt4 key购买 nike

我有一个矩阵(mat),带有暗淡的“13, 20000000”和以下组

[1,] "wildtype"  
[2,] "wildtype"
[3,] "wildtype"
[4,] "wildtype"
[5,] "wildtype"
[6,] "wildtype"
[7,] "wildtype"
[8,] "wildtype"
[9,] "wildtype"
[10,] "wildtype"
[11,] "mutant"
[12,] "mutant"
[13,] "mutant"

使用以下 R 代码,我运行 lm()每个数据点上 20M 次。
lm(mat ~ groups)真的很快。使用 summary(lm1) 为每个模型提取 pvalue 需要很长时间。 .

我怎样才能加快提取 pvalues 的速度?
tvals_out <-'/tmp/tvals_lm.csv'

infile <- '/tmp/tempdata.dat'
con <- file(infile, "rb")
dim <- readBin(con, "integer", 2)
mat <- matrix( readBin(con, "numeric", prod(dim)), dim[1], dim[2])
close(con)

groups = factor(c(rep('wt', 10), rep('mut', 3)))
lm1 <- lm(mat ~ groups)

# This is the longest running bit
sum_lm1 <- summary(lm1)

num_pixels <- dim(mat)[2]

result_pvalues <- numeric(num_pixels)

result_pvalues <- vapply(sum_lm1, function(x) x$coefficients[,4][2], FUN.VALUE = 1)

write.table(result_pvalues, tvals_out, sep=',');


outCon <- file(tvals_out, "wb")
writeBin(result_pvalues, outCon)
close(outCon)

编辑:

我添加了来自 mat 对象的 10 个数据点的示例数据
m <- c(28, 28, 28, 29, 33, 39, 49, 58, 63,64,30, 27, 24, 20, 17, 19, 33, 49, 56,57,36, 32, 28, 23, 20, 27, 48, 77, 96, 103,27, 26, 26, 23, 21, 23, 33, 46, 53,52,24, 20, 17, 13, 11, 14, 33, 47, 40,32,40, 46, 49, 48, 44, 49, 57, 59, 61,53,22, 24, 26, 32, 38, 39, 44, 53, 59,58,16, 16, 14, 10,7, 14, 34, 55, 62,61,28, 25, 21, 19, 22, 32, 45, 58, 64,61,28, 26, 21, 16, 14, 19, 33, 50, 59,59,17, 16, 15, 14, 17, 25, 38, 54, 61,58,11, 11, 12, 13, 16, 23, 34, 46, 51,45,22, 21, 20, 19, 16, 18, 32, 51, 50,38)

mat <- matrix(m, nrow=13)

最佳答案

broom 怎么样?包一试?

install.packages(broom)
library(broom)

tidy(lm(mat ~ groups))
# response term estimate std.error statistic p.value
# 1 Y1 (Intercept) 27.000000 7.967548 3.3887465 6.048267e-03
# 2 Y1 groupswt 14.900000 9.084402 1.6401740 1.292246e-01
# 3 Y2 (Intercept) 23.333333 7.809797 2.9877004 1.234835e-02
# 4 Y2 groupswt 11.366667 8.904539 1.2765026 2.280689e-01
# 5 Y3 (Intercept) 44.000000 17.192317 2.5592828 2.655251e-02
# ...and more...

然后只提取 groupswt 的结果(注意:实现此目的的各种方法......):
subset(tidy(lm(mat ~ groups)), term == "groupswt")[, c(1,6)]
# response p.value
# 2 Y1 0.12922460
# 4 Y2 0.22806894
# 6 Y3 0.88113522
# 8 Y4 0.20645833
# 10 Y5 0.10362436
# 12 Y6 0.84642990
# 14 Y7 0.27171390
# 16 Y8 0.15398258
# 18 Y9 0.66351492
# 20 Y10 0.05942893

关于r - 从 R 中的多个 lm() 中快速检索 pvalues,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33652502/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com