gpt4 book ai didi

r - dplyr::count() 多列

转载 作者:行者123 更新时间:2023-12-04 12:07:42 25 4
gpt4 key购买 nike

我有以下数据集:

dat = structure(list(C86_1981 = c("Outer London", "Buckinghamshire", 
NA, "Ross and Cromarty", "Cornwall and Isles of Scilly", NA,
"Kirkcaldy", "Devon", "Kent", "Renfrew"), C96_1981 = c("Outer London",
"Buckinghamshire", NA, "Ross and Cromarty", "Not known/missing",
NA, "Kirkcaldy", NA, NA, NA), C00_1981 = c("Outer London", "Inner London",
"Lancashire", "Ross and Cromarty", NA, "Humberside", "Kirkcaldy",
NA, NA, NA), C04_1981 = c("Kent", NA, NA, "Ross and Cromarty",
NA, "Humberside", "Not known/missing", NA, NA, "Renfrew"), C08_1981 = c("Kent",
"Oxfordshire", NA, "Ross and Cromarty", "Cornwall and Isles of Scilly",
"Humberside", "Dunfermline", NA, NA, "Renfrew"), C12_1981 = c("Kent",
NA, NA, "Ross and Cromarty", "Cornwall and Isles of Scilly",
"Humberside", "Dunfermline", NA, NA, "Renfrew")), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("C86_1981",
"C96_1981", "C00_1981", "C04_1981", "C08_1981", "C12_1981"))

我要 dplyr::count()每列。预期输出:
# A tibble: 10 x 3
C86_1981 dat86_n dat96_n ...
<chr> <int> <int>
1 Buckinghamshire 1 1
2 Cornwall and Isles of Scilly 1 NA
3 Devon 1 NA
4 Kent 1 NA
5 Kirkcaldy 1 1
6 Outer London 1 1
7 Renfrew 1 NA
8 Ross and Cromarty 1 1
9 <NA> 2 5
10 Not known/missing NA 1

目前我正在手动执行此操作然后 dplyr::full_join()结果:
library("tidyverse")

dat86_n = dat %>%
count(C86_1981) %>%
rename(dat86_n = n)
dat96_n = dat %>%
count(C96_1981) %>%
rename(dat96_n = n)
# ...

dat_counts = dat86_n %>%
full_join(dat96_n, by = c("C86_1981" = "C96_1981"))
# ...

哪个有效,但如果我的任何数据稍后发生更改,则它并不完全可靠。我曾希望以编程方式执行此操作。

我试过一个循环:
lapply(dat, count)
# Error in UseMethod("groups") :
# no applicable method for 'groups' applied to an object of class "character"

( purrr::map() 给出相同的错误)。我认为这个错误是因为 count()期待 tbl和一个变量作为单独的参数,所以我也尝试过:
lapply(dat, function(x) {
count(dat, x)
})
# Error in grouped_df_impl(data, unname(vars), drop) :
# Column `x` is unknown

再次, purrr::map()给出相同的错误。我也试过 summarise_all() 的变体:
dat %>% 
summarise_all(count)
# Error in summarise_impl(.data, dots) :
# Evaluation error: no applicable method for 'groups' applied to an object of class "character".

我觉得我错过了一些明显的东西,解决方案应该很简单。 dplyr解决方案特别受欢迎,因为这是我最常用的解决方案。

最佳答案

还使用 tidyr 包,下面的代码可以解决问题:

dat %>% tidyr::gather(name, city) %>% dplyr::group_by(name, city) %>% dplyr::count() %>% dplyr::ungroup %>% tidyr::spread(name, n)

结果:
# A tibble: 15 x 7
city C00_1981 C04_1981 C08_1981 C12_1981 C86_1981 C96_1981
* <chr> <int> <int> <int> <int> <int> <int>
1 Buckinghamshire NA NA NA NA 1 1
2 Cornwall and Isles of Scilly NA NA 1 1 1 NA
3 Devon NA NA NA NA 1 NA
4 Dunfermline NA NA 1 1 NA NA
5 Humberside 1 1 1 1 NA NA
6 Inner London 1 NA NA NA NA NA
7 Kent NA 1 1 1 1 NA
8 Kirkcaldy 1 NA NA NA 1 1
9 Lancashire 1 NA NA NA NA NA
10 Not known/missing NA 1 NA NA NA 1
11 Outer London 1 NA NA NA 1 1
12 Oxfordshire NA NA 1 NA NA NA
13 Renfrew NA 1 1 1 1 NA
14 Ross and Cromarty 1 1 1 1 1 1
15 <NA> 4 5 3 4 2 5

关于r - dplyr::count() 多列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46339538/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com