gpt4 book ai didi

在 mutate 中减少分组列上的函数

转载 作者:行者123 更新时间:2023-12-04 16:37:13 25 4
gpt4 key购买 nike

我有一个函数可以找到两个散文串的交集:

# Function to get intersection of words
str_intersect_by_word_list <- function(string1, string2){
map2_chr(str_split(string1, '\\s'), str_split(string2, '\\s'),
~str_c(intersect(.x, .y), collapse = " "))
}

和一个包含要匹配的字符串的表格:

# Sample data
my_df <- tibble(
grp = rep(LETTERS[1:3], each = 3),
strng = c(
"Hi I'm Abe",
"Hi I'm Beau",
"Hi I'm Cat",
"Hi there I'm Doug",
"Hi there I'm Emily",
"Hi there I'm Finn",
"Hi it's nice to be here",
"Hi it's nice to meet you",
"Hi it's nice outside"
)
)

如果我想用公共(public)字符串创建一个列,我可以这样做:

# This works as expected
my_df %>%
mutate(
common_string = my_df %>%
pull(strng) %>%
reduce(str_intersect_by_word_list)
)

给出

# A tibble: 9 x 3
grp strng common_string
<chr> <chr> <chr>
1 A Hi I'm Abe Hi
2 A Hi I'm Beau Hi
3 A Hi I'm Cat Hi
4 B Hi there I'm Doug Hi
5 B Hi there I'm Emily Hi
6 B Hi there I'm Finn Hi
7 C Hi it's nice to be here Hi
8 C Hi it's nice to meet you Hi
9 C Hi it's nice outside Hi

我想用每个组共有的字符串创建一个列。但是,在分组内部,我只能访问 strng 的整个列,它提供与上面相同的输出,或者访问 strng 的当前值,这会导致错误,因为我的函数 str_intersect_by_word_list 需要两个输入。

我尝试引用 cur_data_all,但我认为这不是该函数的预期用途,而且它给我的结果与上述相同。

# This fails.
my_df %>%
group_by(grp) %>%
mutate(
grp_string = cur_data_all() %>%
pull(strng) %>%
reduce(str_intersect_by_word_list)
)

预期的输出是

# A tibble: 9 x 3
grp strng grp_string
<chr> <chr> <chr>
1 A Hi I'm Abe Hi I'm
2 A Hi I'm Beau Hi I'm
3 A Hi I'm Cat Hi I'm
4 B Hi there I'm Doug Hi there I'm
5 B Hi there I'm Emily Hi there I'm
6 B Hi there I'm Finn Hi there I'm
7 C Hi it's nice to be here Hi it's nice
8 C Hi it's nice to meet you Hi it's nice
9 C Hi it's nice outside Hi it's nice

如何获取每个组的常用词?

最佳答案

请考虑做:

my_df %>%
group_by(grp) %>%
mutate(common_string = str_c(reduce(str_split(strng, "\\s+"), intersect),collapse = ' '))

# A tibble: 9 x 3
# Groups: grp [3]
grp strng common_string
<chr> <chr> <chr>
1 A Hi I'm Abe Hi I'm
2 A Hi I'm Beau Hi I'm
3 A Hi I'm Cat Hi I'm
4 B Hi there I'm Doug Hi there I'm
5 B Hi there I'm Emily Hi there I'm
6 B Hi there I'm Finn Hi there I'm
7 C Hi it's nice to be here Hi it's nice
8 C Hi it's nice to meet you Hi it's nice
9 C Hi it's nice outside Hi it's nice

关于在 mutate 中减少分组列上的函数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68353009/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com