gpt4 book ai didi

r - 如何在data.table非等额联接中保持联接列不变?

转载 作者:行者123 更新时间:2023-12-04 07:50:17 25 4
gpt4 key购买 nike

我正在尝试使用data.frame的non-equi join功能删除posn中的行,其中data.frame列中的值不在另一个data.table中指定的范围内。

这是我的数据的样子:

library(data.table)
df.cov <-
structure(list(posn = c(1, 2, 3, 165, 1000), att = c("a", "b",
"c", "d", "e")), .Names = c("posn", "att"), row.names = c(NA,
-5L), class = "data.frame")
df.exons <-
structure(list(start = c(2889, 2161, 277, 164, 1), end = c(3329,
2826, 662, 662, 168)), .Names = c("start", "end"), row.names = c(NA,
-5L), class = "data.frame")

setDT(df.cov)
setDT(df.exons)

df.cov
# posn att
# 1: 1 a
# 2: 2 b
# 3: 3 c
# 4: 165 d
# 5: 1000 e
df.exons # ranges of `posn` to include
# start end
# 1: 2889 3329
# 2: 2161 2826
# 3: 277 662
# 4: 164 662
# 5: 1 168

这是我尝试过的:
df.cov[df.exons, on = .(posn >= start, posn <= end), nomatch = 0]
# posn att posn.1
# 1: 164 d 662
# 2: 1 a 168
# 3: 1 b 168
# 4: 1 c 168
# 5: 1 d 168

您可以看到 posn中的 df.cov列也已更改。预期结果如下所示:
#    posn att
# 1: 165 d
# 2: 1 a
# 3: 2 b
# 4: 3 c
# 5 165 d
# the row order doesn't matter. I'll sort by posn latter.
# It is also fine if the duplicated rows are removed, otherwise I'll do this in next step.

如何通过 data.table non-equi join获得所需的输出?

最佳答案

您还可以使用%inrange%:

df.cov[posn %inrange% df.exons]

结果是:

   posn att
1: 1 a
2: 2 b
3: 3 c
4: 165 d


如您所见,这使 posn -column的值保持不变。

另一个(虽然更长)的可能性:
df.exons[df.cov
, on = .(start <= posn, end >= posn)
, mult ='first'
, nomatch = 0
, .(posn = i.posn, att)][]

关于r - 如何在data.table非等额联接中保持联接列不变?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44473644/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com