gpt4 book ai didi

Summarise + case_when with n()(用n()汇总+Case_When)

转载 作者:bug小助手 更新时间:2023-10-28 21:51:51 26 4
gpt4 key购买 nike



I wonder what I am doing wrong here.

我想知道我在这里做错了什么。


I am trying to use case_when() with summarise() to get a summary for each, depending on the number of rows for each id.

我尝试结合使用case_When()和SUMMISE()来获取每个id的摘要,具体取决于每个id的行数。


library(dplyr, warn.conflicts = F)
mock <- tibble::tribble(~id, ~name, ~year,
1, "xy", 2022,
1, "xyz", 2021,
2, "aaa", NA,
3, "xaa", 2021)

mock %>%
group_by(id) %>%
summarise(
condition = case_when(
n() > 1 ~ "problem",
.default = NA_character_
),
name2 = case_when(
n() == 1 ~ name,
.default = NA_character_
)
)
#> Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
#> dplyr 1.1.0.
#> ℹ Please use `reframe()` instead.
#> ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
#> always returns an ungrouped data frame and adjust accordingly.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> `summarise()` has grouped output by 'id'. You can override using the `.groups`
#> argument.
#> # A tibble: 4 × 3
#> # Groups: id [3]
#> id condition name2
#> <dbl> <chr> <chr>
#> 1 1 problem <NA>
#> 2 1 problem <NA>
#> 3 2 <NA> aaa
#> 4 3 <NA> xaa

Created on 2023-09-09 with reprex v2.0.2

创建于2023-09-09,Reprex v2.0.2


But I would just like to have :

但我只想要:


#> # A tibble: 3 × 3
#> id condition name2
#> <dbl> <chr> <chr>
#> 2 1 problem <NA>
#> 3 2 <NA> aaa
#> 4 3 <NA> xaa

更多回答
优秀答案推荐


case_when is used to iterate down a column and create a new vector based on the existing values in other columns. That's not what you are trying to do here. You are trying to conditionally choose a single output based on the group size, which is always a length-1 integer. Effectively, the value of n() gets recycled into a vector of the same length as the group size. If you want the output of summarize to be length one, you should use if and else, not case_when or if_else.

CASE_WHEN用于向下迭代一列,并基于其他列中的现有值创建新的向量。这不是你在这里试图做的事情。您正在尝试根据组大小有条件地选择单个输出,组大小始终是长度为-1的整数。实际上,n()的值被循环到一个与组大小相同长度的向量中。如果希望汇总的输出长度为1,则应该使用IF和ELSE,而不是CASE_WHEN或IF_ELSE。


mock %>% 
group_by(id) %>%
summarize(
condition = if(n() > 1) 'problem' else NA_character_,
name2 = if(n() == 1) name else NA_character_
)
#> # A tibble: 3 x 3
#> id condition name2
#> <dbl> <chr> <chr>
#> 1 1 problem <NA>
#> 2 2 <NA> aaa
#> 3 3 <NA> xaa

Created on 2023-09-09 with reprex v2.0.2

创建于2023-09-09,Reprex v2.0.2




You can use case_when like this:

您可以在如下所示的情况下使用Case_When:


using first() or [1] will overcome the issue explained by @Allan Cameron

使用first()或[1]可以解决@Allan Cameron解释的问题


library(dplyr)

mock %>%
group_by(id) %>%
summarise(
condition = case_when(
n() > 1 ~ "problem",
TRUE ~ NA_character_
),
name2 = case_when(
# n() == 1 ~ name[1],
n() == 1 ~ first(name),
TRUE ~ NA_character_
),
.groups = 'drop'
)


id condition name2
<dbl> <chr> <chr>
1 1 problem NA
2 2 NA aaa
3 3 NA xaa


Try this

尝尝这个


within(mock, {
condition <- ave(name, id, FUN=\(x) switch(length(unique(x)), NA, 'problem'))
name1 <- replace(name, !is.na(condition), NA)
rm(name, year)
}) |> unique()
# id name1 condition
# 1 1 <NA> problem
# 3 2 aaa <NA>
# 4 3 xaa <NA>



Data:

数据:


mock <- structure(list(id = c(1, 1, 2, 3), name = c("xy", "xyz", "aaa", 
"xaa"), year = c(2022, 2021, NA, 2021)), row.names = c(NA, -4L
), class = "data.frame")

更多回答

Yea, I know, just wondering if this is a case_when bug, or I am misunderstanding something :)

是的,我知道,只是想知道这是案例_When错误,还是我误解了什么:)

@olivroy this isn't really the correct use case for case_when. n() returns a single atomic value, but case_when is for iterating down a column. The output you are getting is because n() is getting recycled into a vector the same size as the current group. This is expected behaviour. If you want to use conditional logic based on a single atomic value like n() that will also determine the group size, you should use if and else, not case_when or if_else

@olivroy这并不是Case_When的正确用例。N()返回单个原子值,但Case_When用于向下迭代列。您得到的输出是因为n()被循环到与当前组大小相同的向量中。这是意料之中的行为。如果要使用基于单个原子值(如n())的条件逻辑,该条件逻辑还将确定组大小,则应使用IF和ELSE,而不是CASE_WHEN或IF_ELSE

That's what I just realized. In case_when docs: Return value: A vector with the same size as the common size computed from the inputs in ...

这就是我刚刚意识到的。In Case_When docs:返回值:一个大小与从...中的输入计算的公共大小相同的向量。

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com