gpt4 book ai didi

r - 将多列添加到 R 中的 data.table

转载 作者:行者123 更新时间:2023-12-04 10:47:43 25 4
gpt4 key购买 nike

我有一个包含序列和读取次数的 data.table,如下所示:

   sequence   num_reads
1: AACCTGCCG 1
2: CGCGCTCAA 12
3: AGTGTGAGC 3
4: TGGGTACAC 11
5: GGCCGCGTG 15
6: CCTTAAGAG 2
7: GCGGAACTG 9
8: GCGTTGTAG 17
9: GTTGTAGCG 20
10: ACACGTGAC 16

我想根据应用具有两个权重和两个 lambda 的 dpois() 的结果,使用 data.table 向该表添加两个新列。正确的输出应该是这样的(基于使用 data.frame):

sequence     num_reads                       clus1                       clus2
1 AACCTGCCG 1 2.553269503552647000377e-03 1.610220613932057849571e-03
2 CGCGCTCAA 12 1.053993989051599418361e-02 2.887608256917401083896e-02
3 AGTGTGAGC 3 2.085170094567994833468e-02 1.717568654860860896672e-02
4 TGGGTACAC 11 1.806846838374168498498e-02 4.331412385376097462508e-02
5 GGCCGCGTG 15 1.324248858039188620275e-03 5.415587646672919558410e-03
6 CCTTAAGAG 2 8.936443262434262332916e-03 6.440882455728230530922e-03
7 GCGGAACTG 9 4.056186780023639942838e-02 7.444615037365168164207e-02
8 GCGTTGTAG 17 2.385595369261770803265e-04 1.274255916864215588610e-03
9 GTTGTAGCG 20 1.196285397159046524451e-05 9.538289904012846548518e-05
10 ACACGTGAC 16 5.793588753921446012421e-04 2.707793823336458478163e-03

但是当我尝试使用 data.table 时,我似乎无法获得正确的结果。这是我尝试过的(基于围绕该主题提出的类似问题):

    pois = function(n, p, l){return(dpois(as.numeric(as.character(n)), l)*p) }
x = x[, c(paste("clus", seq(1,2), sep = '')) := pois(num_reads, c(0.4,0.6), c(7,8)), by = seq_len(nrow(x))]

结果如下:

          sequence num_reads                             clus1                      clus2
1: AACCTGCCG 1 2.553269503552647000377e-03 2.553269503552647000377e-03
2: CGCGCTCAA 12 1.053993989051599418361e-02 1.053993989051599418361e-02
3: AGTGTGAGC 3 2.085170094567994833468e-02 2.085170094567994833468e-02
4: TGGGTACAC 11 1.806846838374168498498e-02 1.806846838374168498498e-02
5: GGCCGCGTG 15 1.324248858039188620275e-03 1.324248858039188620275e-03
6: CCTTAAGAG 2 8.936443262434262332916e-03 8.936443262434262332916e-03
7: GCGGAACTG 9 4.056186780023639942838e-02 4.056186780023639942838e-02
8: GCGTTGTAG 17 2.385595369261770803265e-04 2.385595369261770803265e-04
9: GCGTTGTAG 20 1.196285397159046524451e-05 1.196285397159046524451e-05
10: ACACGTGAC 16 5.793588753921446012421e-04 5.793588753921446012421e-04

我使用 data.table 而不是 data.frame 的原因是我的真实数据有 100,000 行。我研究了 thisthis 的答案,但我一直无法想出解决方案。

如果您有任何提示,我们将不胜感激。谢谢!

最佳答案

我们可以试试

x[,  paste("clus", seq(1,2), sep = ''):= 
as.list(pois(num_reads, c(0.4,0.6), c(7,8))), by = seq_len(nrow(x))]
x
# sequence num_reads clus1 clus2
#1: AACCTGCCG 1 0.00255326950 0.0016102206
#2: CGCGCTCAA 12 0.01053993989 0.0288760826
#3: AGTGTGAGC 3 0.02085170095 0.0171756865
#4: TGGGTACAC 11 0.01806846838 0.0433141239
#5: GGCCGCGTG 15 0.00132424886 0.0054155876
#6: CCTTAAGAG 2 0.00893644326 0.0064408825
#7: GCGGAACTG 9 0.04056186780 0.0744461504
#8: GCGTTGTAG 17 0.00023855954 0.0012742559
#9: GTTGTAGCG 20 0.00001196285 0.0000953829
#10:ACACGTGAC 16 0.00057935888 0.0027077938

关于r - 将多列添加到 R 中的 data.table,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42059460/

25 4 0