Summarise + case_when with n()(用n()汇总+Case

Summarise + case_when with n()(用n()汇总+Case_When)

转载作者：bug小助手更新时间：2023-10-28 21:51:51

27

4

I wonder what I am doing wrong here.

我想知道我在这里做错了什么。

I am trying to use case_when() with summarise() to get a summary for each, depending on the number of rows for each id.

我尝试结合使用case_When()和SUMMISE()来获取每个id的摘要，具体取决于每个id的行数。

library(dplyr, warn.conflicts = F)
mock <- tibble::tribble(~id, ~name, ~year,
                1, "xy", 2022,
                1, "xyz", 2021,
                2, "aaa", NA,
                3, "xaa", 2021)

mock %>% 
  group_by(id) %>% 
  summarise(
    condition = case_when(
      n() > 1 ~ "problem",
      .default = NA_character_
    ),
    name2 = case_when(
      n() == 1 ~ name,
      .default = NA_character_
    )
  )
#> Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
#> dplyr 1.1.0.
#> ℹ Please use `reframe()` instead.
#> ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
#>   always returns an ungrouped data frame and adjust accordingly.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> `summarise()` has grouped output by 'id'. You can override using the `.groups`
#> argument.
#> # A tibble: 4 × 3
#> # Groups:   id [3]
#>      id condition name2
#>   <dbl> <chr>     <chr>
#> 1     1 problem   <NA> 
#> 2     1 problem   <NA> 
#> 3     2 <NA>      aaa  
#> 4     3 <NA>      xaa

^{Created on 2023-09-09 with reprex v2.0.2}

创建于2023-09-09，Reprex v2.0.2

But I would just like to have :

但我只想要：

#> # A tibble: 3 × 3
#>      id condition name2
#>   <dbl> <chr>     <chr>
#> 2     1 problem   <NA> 
#> 3     2 <NA>      aaa  
#> 4     3 <NA>      xaa

更多回答

优秀答案推荐

case_when is used to iterate down a column and create a new vector based on the existing values in other columns. That's not what you are trying to do here. You are trying to conditionally choose a single output based on the group size, which is always a length-1 integer. Effectively, the value of n() gets recycled into a vector of the same length as the group size. If you want the output of summarize to be length one, you should use if and else, not case_when or if_else.

CASE_WHEN用于向下迭代一列，并基于其他列中的现有值创建新的向量。这不是你在这里试图做的事情。您正在尝试根据组大小有条件地选择单个输出，组大小始终是长度为-1的整数。实际上，n()的值被循环到一个与组大小相同长度的向量中。如果希望汇总的输出长度为1，则应该使用IF和ELSE，而不是CASE_WHEN或IF_ELSE。

mock %>% 
  group_by(id) %>% 
  summarize(
    condition = if(n() > 1) 'problem' else NA_character_, 
    name2     = if(n() == 1) name else NA_character_
  )
#> # A tibble: 3 x 3
#>      id condition name2
#>   <dbl> <chr>     <chr>
#> 1     1 problem   <NA> 
#> 2     2 <NA>      aaa  
#> 3     3 <NA>      xaa

^{Created on 2023-09-09 with reprex v2.0.2}

创建于2023-09-09，Reprex v2.0.2

You can use case_when like this:

您可以在如下所示的情况下使用Case_When：

using first() or [1] will overcome the issue explained by @Allan Cameron

使用first()或[1]可以解决@Allan Cameron解释的问题

library(dplyr)

mock %>% 
  group_by(id) %>% 
  summarise(
    condition = case_when(
      n() > 1 ~ "problem",
      TRUE ~ NA_character_
    ),
    name2 = case_when(
      # n() == 1 ~ name[1],
      n() == 1 ~ first(name),
      TRUE ~ NA_character_
    ),
    .groups = 'drop'
  )


   id condition name2
  <dbl> <chr>     <chr>
1     1 problem   NA   
2     2 NA        aaa  
3     3 NA        xaa

Try this

尝尝这个

within(mock, {
  condition <- ave(name, id, FUN=\(x) switch(length(unique(x)), NA, 'problem'))
  name1 <- replace(name, !is.na(condition), NA)
  rm(name, year)
  }) |> unique()
#   id name1 condition
# 1  1  <NA>   problem
# 3  2   aaa      <NA>
# 4  3   xaa      <NA>

Data:

数据：

mock <- structure(list(id = c(1, 1, 2, 3), name = c("xy", "xyz", "aaa", 
"xaa"), year = c(2022, 2021, NA, 2021)), row.names = c(NA, -4L
), class = "data.frame")

更多回答

Yea, I know, just wondering if this is a case_when bug, or I am misunderstanding something :)

是的，我知道，只是想知道这是案例_When错误，还是我误解了什么：)

@olivroy this isn't really the correct use case for case_when. n() returns a single atomic value, but case_when is for iterating down a column. The output you are getting is because n() is getting recycled into a vector the same size as the current group. This is expected behaviour. If you want to use conditional logic based on a single atomic value like n() that will also determine the group size, you should use if and else, not case_when or if_else

@olivroy这并不是Case_When的正确用例。N()返回单个原子值，但Case_When用于向下迭代列。您得到的输出是因为n()被循环到与当前组大小相同的向量中。这是意料之中的行为。如果要使用基于单个原子值(如n())的条件逻辑，该条件逻辑还将确定组大小，则应使用IF和ELSE，而不是CASE_WHEN或IF_ELSE

That's what I just realized. In case_when docs: Return value: A vector with the same size as the common size computed from the inputs in ...

这就是我刚刚意识到的。In Case_When docs：返回值：一个大小与从...中的输入计算的公共大小相同的向量。

27

4

0

文章推荐： python - `namedtuple` 在 3.5.1 中有什么变化吗？

文章推荐： python - Python 中的 AKS Primes 算法

Summarise + case_when with n()(用n()汇总+Case_When)
我想知道我在这里做错了什么。。我尝试结合使用case_When()和SUMMISE()来获取每个id的摘要，具体取决于每个id的行数。。创建于2023-09-09，Reprex v2.0.2。但我只想
Summarise + case_when with n()(用n()汇总+Case_When)
我想知道我在这里做错了什么。。我尝试结合使用case_When()和SUMMISE()来获取每个id的摘要，具体取决于每个id的行数。。创建于2023-09-09，Reprex v2.0.2。但我只想
Summarise + case_when with n()(用n()汇总+Case_When)
我想知道我做错了什么。。我尝试结合使用case_When()和SUMMISE()来获取每个id的摘要，具体取决于每个id的行数。。创建于2023-09-09，Reprex v2.0.2。但我只想有：
Summarise + case_when with n()(用n()汇总+Case_When)
我想知道我在这里做错了什么。。我尝试结合使用case_When()和SUMMISE()来获取每个id的摘要，具体取决于每个id的行数。。创建于2023-09-09，Reprex v2.0.2。但我只想
r - case_when & %in%
我正在尝试在 case_when() 中使用 %in%，但是它不像在 R 中一般那样工作。下面是一个示例。有人可以指导这个实现有什么问题吗？ df % mutate(flag=case_when(na
r - 如何对整个数据框使用 case_when？
我想将 case_when 应用于数据框中的所有列。 set.seed(1) data 0.5”替换，对于高于 1 的列，我想用“>1”替换。我试过 case_when，但似乎我必须指定像 x 和
r - case_when 替换的列具有不同的类型
我正在处理以下需要使用 case_when 的问题。但是，我遇到了错误消息 Error: must be a logical vector, not a double vector 因为替换的列不是同
r - case_when 重复行
这是我的虚拟数据: df 1 0.200 a blue 2 1.99 b blue 3 0.663 c blue 4 1.79 d red 5 3.
r - 当多个变量的条件相同时简化 case_when()
假设我想根据多个其他变量中的条件创建一个新变量，并且每个变量的条件都相同。我知道我可以使用 case_when()，但我很想知道如果我的条件短语对于每个条件变量都相同，是否可以简化这一步。我还想知道这
r - case_when() 评估多个条件的问题
我正在尝试检测字符串中是否存在特定的关键字和短语，如果它们存在，我想在新列中发布特定的数字。我的问题是某些字符串有多个关键字，但 case_when 只返回第一个匹配项。有没有办法解决这个问题，或者我
r rowSums in case_when
考虑到这是我的数据集 df % mutate( group1_total % group_by(Group) %>% mutate(total = rowSums(sele
r rowSums in case_when
考虑到这是我的数据集 df % mutate( group1_total % group_by(Group) %>% mutate(total = rowSums(sele
r - 为什么 case_when 不能返回不同长度的向量？
这失败了: library(tidyverse) myFn 3 ~ letters[1:3], TRUE ~ letters[1:2] ) } myFn(4) # Error: `TRU
r - 如何从数据框中为 case_when 构造参数？
我正在尝试根据温度创建许多不同的可能加权方案。我创建了一个数据框，其中包含 8 个向量的所有可能组合(每个向量代表一个温度范围)。所以数据框的列是特定的温度范围，行是权重。我想将温度范围作为参数传
r - 在变异中使用 case_when 时是否需要数据框名称？
这个问题在这里已经有了答案: case_when in mutate pipe (6 个回答) 5年前关闭。 full % mutate(Title = case_when( Title
r - 使用每个输入变量的多个输出变量跨 case_when 进行变异
我有一个大数据框(下面是一个小样本)，我需要根据某些条件将所有以相同前缀开头的列转换为多个列，保留原始变量并将原始后缀携带到新变量。数据: egp % mutate(across(contain
r - Tidyeval 评估 case_when
与Tidy evaluation programming with dplyr::case_when有些相关和 Making tidyeval function inside case_when ，我
r - 分组 case_when 返回重复行
我按标识符分组(标识符可能有多行)，然后尝试使用 group_by 为每个标识符创建一个唯一的行+ summarize与 case_when .这个想法是，如果有 2 行或更多行，则返回单个预设值，否
R:转换为与 case_when 相同的级别顺序的因子
在进行数据分析时，有时需要将值重新编码为因子以进行组分析。我想保持因子的顺序与 case_when 中指定的转换顺序相同.在这种情况下，订单应该是 "Excellent" "Good" "Fail"
r - 以编程方式使用带参数的 dplyr::case_when
我希望能够使用 dplyr的 case_when以编程方式替换基础 R cut()功能。目前， case_when 可以通过 NSE 与外部参数一起使用，例如: library(dplyr) lib

首页

博学

6Ren·AI

商城

Summarise + case_when with n()(用n()汇总+Case_When)