I wonder what I am doing wrong here.
我想知道我在这里做错了什么。
I am trying to use case_when()
with summarise()
to get a summary for each, depending on the number of rows for each id
.
我尝试结合使用case_When()和SUMMISE()来获取每个id的摘要,具体取决于每个id的行数。
library(dplyr, warn.conflicts = F)
mock <- tibble::tribble(~id, ~name, ~year,
1, "xy", 2022,
1, "xyz", 2021,
2, "aaa", NA,
3, "xaa", 2021)
mock %>%
group_by(id) %>%
summarise(
condition = case_when(
n() > 1 ~ "problem",
.default = NA_character_
),
name2 = case_when(
n() == 1 ~ name,
.default = NA_character_
)
)
#> Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
#> dplyr 1.1.0.
#> ℹ Please use `reframe()` instead.
#> ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
#> always returns an ungrouped data frame and adjust accordingly.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> `summarise()` has grouped output by 'id'. You can override using the `.groups`
#> argument.
#> # A tibble: 4 × 3
#> # Groups: id [3]
#> id condition name2
#> <dbl> <chr> <chr>
#> 1 1 problem <NA>
#> 2 1 problem <NA>
#> 3 2 <NA> aaa
#> 4 3 <NA> xaa
Created on 2023-09-09 with reprex v2.0.2
创建于2023-09-09,Reprex v2.0.2
But I would just like to have :
但我只想要:
#> # A tibble: 3 × 3
#> id condition name2
#> <dbl> <chr> <chr>
#> 2 1 problem <NA>
#> 3 2 <NA> aaa
#> 4 3 <NA> xaa
更多回答
case_when
is used to iterate down a column and create a new vector based on the existing values in other columns. That's not what you are trying to do here. You are trying to conditionally choose a single output based on the group size, which is always a length-1 integer. Effectively, the value of n()
gets recycled into a vector of the same length as the group size. If you want the output of summarize
to be length one, you should use if
and else
, not case_when
or if_else
.
CASE_WHEN用于向下迭代一列,并基于其他列中的现有值创建新的向量。这不是你在这里试图做的事情。您正在尝试根据组大小有条件地选择单个输出,组大小始终是长度为-1的整数。实际上,n()的值被循环到一个与组大小相同长度的向量中。如果希望汇总的输出长度为1,则应该使用IF和ELSE,而不是CASE_WHEN或IF_ELSE。
mock %>%
group_by(id) %>%
summarize(
condition = if(n() > 1) 'problem' else NA_character_,
name2 = if(n() == 1) name else NA_character_
)
#> # A tibble: 3 x 3
#> id condition name2
#> <dbl> <chr> <chr>
#> 1 1 problem <NA>
#> 2 2 <NA> aaa
#> 3 3 <NA> xaa
Created on 2023-09-09 with reprex v2.0.2
创建于2023-09-09,Reprex v2.0.2
You can use case_when
like this:
您可以在如下所示的情况下使用Case_When:
using first()
or [1]
will overcome the issue explained by @Allan Cameron
使用first()或[1]可以解决@Allan Cameron解释的问题
library(dplyr)
mock %>%
group_by(id) %>%
summarise(
condition = case_when(
n() > 1 ~ "problem",
TRUE ~ NA_character_
),
name2 = case_when(
# n() == 1 ~ name[1],
n() == 1 ~ first(name),
TRUE ~ NA_character_
),
.groups = 'drop'
)
id condition name2
<dbl> <chr> <chr>
1 1 problem NA
2 2 NA aaa
3 3 NA xaa
Try this
尝尝这个
within(mock, {
condition <- ave(name, id, FUN=\(x) switch(length(unique(x)), NA, 'problem'))
name1 <- replace(name, !is.na(condition), NA)
rm(name, year)
}) |> unique()
# id name1 condition
# 1 1 <NA> problem
# 3 2 aaa <NA>
# 4 3 xaa <NA>
Data:
数据:
mock <- structure(list(id = c(1, 1, 2, 3), name = c("xy", "xyz", "aaa",
"xaa"), year = c(2022, 2021, NA, 2021)), row.names = c(NA, -4L
), class = "data.frame")
更多回答
Yea, I know, just wondering if this is a case_when bug, or I am misunderstanding something :)
是的,我知道,只是想知道这是案例_When错误,还是我误解了什么:)
@olivroy this isn't really the correct use case for case_when
. n()
returns a single atomic value, but case_when
is for iterating down a column. The output you are getting is because n()
is getting recycled into a vector the same size as the current group. This is expected behaviour. If you want to use conditional logic based on a single atomic value like n()
that will also determine the group size, you should use if
and else
, not case_when
or if_else
@olivroy这并不是Case_When的正确用例。N()返回单个原子值,但Case_When用于向下迭代列。您得到的输出是因为n()被循环到与当前组大小相同的向量中。这是意料之中的行为。如果要使用基于单个原子值(如n())的条件逻辑,该条件逻辑还将确定组大小,则应使用IF和ELSE,而不是CASE_WHEN或IF_ELSE
That's what I just realized. In case_when docs: Return value: A vector with the same size as the common size computed from the inputs in ...
这就是我刚刚意识到的。In Case_When docs:返回值:一个大小与从...中的输入计算的公共大小相同的向量。
我是一名优秀的程序员,十分优秀!