gpt4 book ai didi

r - 用空因子聚合但保留行

转载 作者:行者123 更新时间:2023-12-01 11:24:29 26 4
gpt4 key购买 nike

我对 by() 有一个类似的问题,在那里我接受了我必须手动替换生成的 NA 的事实。现在我想聚合我的 data.frame 并保留结构。例如我的较大数据集包含 100 个国家 * 10 年 * 5 个分割市场的因子,因此它应该减少到 5000 行。但有时一些分割因素是空的,我只得到 <5000 行。我无法理解它......

我的 MWE 仍然适用:

#All 3 categories are used
df1<-data.frame( val=rep(seq(1:4),3), factor=cut(rep(seq(1:4),3),breaks=c(1,2,3,4), include.lowest = TRUE, ordered_results=True , labels=LETTERS[1:3]))
# Thirds category is not used
df2<-data.frame( val=rep(seq(1:3),4), factor=cut(rep(seq(1:3),4),breaks=c(1,2,3,4), include.lowest = TRUE, ordered_results=True , labels=LETTERS[1:3]))

#df1 reduces to 3 rows as each category is used
aggregate(df1$val,list(df1$factor),sum)
#df2 reduces to 2 rows because C is empty
aggregate(df2$val,list(df2$factor),sum)
#I would like
data.frame(Group.1=LETTERS[1:3], x=c(12,12,0))

Group.1 x
1 A 12
2 B 12
3 C 0

最佳答案

# create dataset
df2 <- data.frame( val=rep(seq(1:3),4), factor=cut(rep(seq(1:3),4),breaks=c(1,2,3,4), include.lowest = TRUE, ordered_results=True , labels=LETTERS[1:3]))

library(dplyr)

levels(df2$factor) %>% # get distinct levels of the factor variable
data.frame(factor = .) %>% # create a data frame
left_join(df2 %>% # join with
group_by(factor) %>% # for each value that exists
summarise(x = sum(val)), by = "factor") %>% # sum column val
mutate(x = coalesce(x, 0L)) # replace NAs with 0s

# factor x
# 1 A 12
# 2 B 12
# 3 C 0

或者没有任何包裹
dd = merge(data.frame(Group.1 = levels(df2$factor)), 
aggregate(df2$val,list(df2$factor),sum), all.x = T)
dd$x = ifelse(is.na(dd$x), 0, dd$x)
dd

# Group.1 x
# 1 A 12
# 2 B 12
# 3 C 0

或使用 data.table包以检查它是否更快
library(data.table)

# assuming you start with a data frame
df2 <- data.frame( val=rep(seq(1:3),4), factor=cut(rep(seq(1:3),4),breaks=c(1,2,3,4), include.lowest = TRUE, ordered_results=True , labels=LETTERS[1:3]))

# create a data table with all unique values of the variable "factor" and an index (key) on that variable
dt_levels = data.table(factor = levels(df2$factor), key = "factor")

# make df2 a data table with an index on column "factor" and aggregate
dt_sum = setDT(df2, key = "factor")[, list(Sum = sum(val)), by = "factor"]

# left join the two data tables and replace NA values with 0s
dt_result = dt_sum[dt_levels][, Sum := ifelse(is.na(Sum), 0, Sum)]

dt_result[]

# factor Sum
# 1: A 12
# 2: B 12
# 3: C 0

关于r - 用空因子聚合但保留行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38898276/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com