gpt4 book ai didi

r - R中没有组的行的合并和部分添加

转载 作者:行者123 更新时间:2023-12-04 10:36:15 24 4
gpt4 key购买 nike

以下是我为 dplyr 编写的问题的代表:

library(tidyverse)

df <- tibble(State = c("A", "A", "A", "A", "A", "A", "B", "B", "B"),
District_code = c(1:9),
District = c("North", "West", "North West", "South", "East", "South East",
"XYZ", "ZYX", "AGS"),
Population = c(1000000, 2000000, 3000000, 4000000, 5000000, 6000000,
7000000, 8000000, 9000000))

df
#> # A tibble: 9 x 4
#> State District_code District Population
#> <chr> <int> <chr> <dbl>
#> 1 A 1 North 1000000
#> 2 A 2 West 2000000
#> 3 A 3 North West 3000000
#> 4 A 4 South 4000000
#> 5 A 5 East 5000000
#> 6 A 6 South East 6000000
#> 7 B 7 XYZ 7000000
#> 8 B 8 ZYX 8000000
#> 9 B 9 AGS 9000000

对于某些州,我需要将使用名称的地区合并到更少的地理类别中。特别是A国应该只有:“North-West-North West”和“South-East-South East”。必须添加一些变量,例如人口;但像 District_code 这样的其他人应该获得 NA。我找到了 this example跨行的操作,但它并不完全相同。 Grouping似乎不适用。

最终结果应该是这样的:

new_df
#> # A tibble: 5 x 4
#> State District_code District Population
#> <chr> <int> <chr> <dbl>
#> 1 A NA North - West - North West 5000000
#> 2 A NA South - East - South East 15000000
#> 3 B 7 XYZ 7000000
#> 4 B 8 ZYX 8000000
#> 5 B 9 AGS 9000000

在真实的数据框中,有许多变量(如 Population)必须添加,还有许多其他变量(如 District_code)必须获取 NA 值。

非常感谢您的帮助!

最佳答案

您可以使用 fct_collapse 指定新的因子水平,然后对新组使用 summarise

df %>%
mutate(District =
fct_collapse(District,
"North - West - North West" = c("North", "West", "North West"),
"South - East - South East" = c("South", "East", "South East"))) %>%
group_by(State, District) %>%
summarise(Population = sum(Population),
District_code = ifelse(n() > 1, NA_real_, District_code))

# A tibble: 5 x 3
# Groups: State [?]
# State District Population
# <chr> <fct> <dbl>
# 1 A South - East - South East 15000000
# 2 A North - West - North West 6000000
# 3 B AGS 9000000
# 4 B XYZ 7000000
# 5 B ZYX 8000000

如果您只想为某些特定州更改学区,您可以像这样添加 case_whenif_else 并根据列(这里是人口的两倍,而不是地区的整数)

df %>%
mutate(District =
case_when(State == "A" ~
fct_collapse(District,
"North - West - North West" = c("North", "West", "North West"),
"South - East - South East" = c("South", "East", "South East")),
TRUE ~ factor(District))) %>%
group_by(State, District) %>%
summarise_all(funs({if(is.double(.)) {
sum(.)
} else {
if (length(unique(.)) > 1) {
NA
} else {
unique(.)
}
}}))

关于r - R中没有组的行的合并和部分添加,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52723097/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com