gpt4 book ai didi

R dplyr按范围或虚拟列联接

转载 作者:行者123 更新时间:2023-12-04 17:31:06 26 4
gpt4 key购买 nike

我想通过范围或虚拟列加入两个小贴士。但似乎by-参数只允许处理现有列名的chrvector(chr)

在我的示例中,我有一个带有d列的小巧value和一个带有rfrom列的小巧to

d <- tibble(value = seq(1,6, by = 0.2))
r <- tibble(from = seq(1,6), to = c(seq(2,6),Inf), class = LETTERS[seq(1,6)])

> d
# A tibble: 26 x 1
value
<dbl>
1 1.0
2 1.2
3 1.4
4 1.6
5 1.8
6 2.0
7 2.2
8 2.4
9 2.6
10 2.8
# ... with 16 more rows

> r
# A tibble: 6 x 3
from to class
<int> <dbl> <chr>
1 1 2 A
2 2 3 B
3 3 4 C
4 4 5 D
5 5 6 E
6 6 Inf F

现在我想在 valuedfrom的范围内加入 tor列:
d %>% inner_join(r, by = "value between from and to")     # >= and <

我找不到执行此操作的方法,因此决定将 floorvaluedfrom中的 r列一起加入
d %>% inner_join(r, by = c("floor(value)" = "from"))

当然,我可以创建第二列来解决该问题:
d %>% 
mutate(join_value = floor(value)) %>%
inner_join(r, by = c("join_value" = "from")) %>%
select(value, class)

# A tibble: 26 x 2
value class
<dbl> <chr>
1 1.0 A
2 1.2 A
3 1.4 A
4 1.6 A
5 1.8 A
6 2.0 B
7 2.2 B
8 2.4 B
9 2.6 B
10 2.8 B
# ... with 16 more rows

难道没有更舒适的方法吗?

谢谢

最佳答案

我认为不平等联接尚未在dplyr中实现,或者曾经会实现(请参阅Join on inequality constraints上的讨论),但这是使用SQL联接的好情况:

library(tibble)
library(sqldf)

as.tibble(sqldf("select d.value, r.class from d
join r on d.value >= r.'from' and
d.value < r.'to'"))

另外,如果要将联接集成到 dplyr链中,则可以使用 fuzzyjoin::fuzzy_join:
library(dplyr)
library(fuzzyjoin)

d %>%
fuzzy_join(r, by = c("value" = "from", "value" = "to"),
match_fun = list(`>=`, `<`)) %>%
select(value, class)

结果:
# A tibble: 31 x 2
value class
<dbl> <chr>
1 1.0 A
2 1.2 A
3 1.4 A
4 1.6 A
5 1.8 A
6 2.0 A
7 2.0 B
8 2.2 B
9 2.4 B
10 2.6 B
# ... with 21 more rows

注意,我在 fromto周围添加了单引号,因为它们是SQL语言的保留字。

关于R dplyr按范围或虚拟列联接,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46795636/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com