gpt4 book ai didi

r - R中由带有dplyr的另一列分组的分类值的计数

转载 作者:行者123 更新时间:2023-12-02 01:34:36 26 4
gpt4 key购买 nike

我想按位置名称总结 df。数据看起来像这样:

location <- c("NY", "NC", "KA", "TX", "AZ", "NC", "SC", "ND", "SD", "MN","WA","MA","VT","CA","OR","NJ","OH","MI","IL","GA","FL")
tree_type <- c("pine", "birch", "maple", "palm")
df <- data.frame(location = sample(location, 20, replace = TRUE),
tree_type = sample(tree_type, 20, replace = TRUE),
density = runif(20, min = 24, max = 365),
income = runif(20, min = 37000, max = 62000))

我想要的是这个:
   location mean(density) mean(income) birch maple palm pine
1 AZ 38.44009 52032.95 0 0 1 0
2 CA 136.85112 42243.35 0 1 0 0
3 GA 101.24081 53405.60 2 0 0 0
4 IL 172.02651 46368.42 1 1 0 0
5 MA 198.69868 51117.18 0 0 0 1
6 MI 153.93358 60425.87 1 0 0 0
7 MN 185.05276 46468.68 0 0 1 0
8 NC 181.42187 46007.93 1 0 2 0
9 NJ 302.66541 59316.94 0 0 2 0
10 OR 303.88283 48497.03 0 0 0 2
11 SC 84.05136 50348.41 0 1 0 1
12 SD 158.47423 57894.27 0 0 1 0
13 VT 126.32967 42853.04 0 0 1 0

我是这样做的:
require(dplyr)
require(reshape2)
df_quantvars <- df %>% group_by(location) %>% summarise(mean(density), mean(income))
df_catvarslong <- as.data.frame(table(df[1:2]))
df_catvarswide <- dcast(df_catvarslong, location ~ tree_type, value.var = "Freq")
final_df <- left_join(df_quantvars, df_catvarswide, by = "location")

dplyr 中没有办法做到这一点吗? group_by 成语?冒着听起来很愚蠢的风险,我尝试这样做:
df_quantvars <- df %>% group_by(location) %>% summarise(mean(density), mean(income), table(df[1:2]))
我错过了什么?

最佳答案

这个回复有点晚了,但我已经投入了一些工作。一次性完成这一切有点棘手。这似乎有效:

首先我使用 group_by(location, tree_type)计算所有的树,然后我使用 group_by(location)以获得所需的手段。然后我用 select(-c(density, income) 删除原始密度和收入类别并留下重复的行,但正确的聚合计数。然后我用 distinct() 删除重复项然后使用 spread()来自 tidyr库根据您的要求转换为宽格式。

library(dplyr)
library(tidyr)

df %>%
arrange(location)%>%
group_by(location, tree_type)%>%
mutate(Count = n())%>%
group_by(location)%>%
mutate(MeanDensity = mean(density),
MeanIncome = mean(income))%>%
ungroup()%>%
select(-c(density, income))%>%
distinct()%>%
spread(key = tree_type, value = Count, fill = 0)

这给了我:
  location MeanDensity MeanIncome birch maple  palm  pine
(fctr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
1 AZ 244.18094 57474.94 0 0 1 0
2 FL 51.90693 42425.36 0 0 0 1
3 GA 341.18643 49385.44 0 0 0 2
4 IL 258.11124 37101.36 0 1 0 0
5 KA 267.92430 59699.20 1 0 0 0
6 MA 87.48623 60632.98 1 0 0 0
7 MI 197.18310 58837.00 0 0 0 1
8 NC 362.48531 50857.42 0 0 1 0
9 ND 315.57415 51465.06 0 0 1 0
10 NJ 233.72886 55877.40 0 0 1 1
11 NY 283.41522 49275.58 0 1 0 1
12 OH 350.23362 40901.73 0 0 1 0
13 OR 267.68415 38954.04 0 2 0 0
14 SC 260.12169 52837.10 0 1 0 0
15 SD 76.29782 54986.76 0 1 0 0
16 VT 341.80646 44547.77 1 0 0 0

关于r - R中由带有dplyr的另一列分组的分类值的计数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31956104/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com