gpt4 book ai didi

r dplyr 和 data.table : aggregate then join back to original table

转载 作者:行者123 更新时间:2023-12-02 02:51:02 26 4
gpt4 key购买 nike

我经常发现自己使用以下 dplyr 语法计算数据帧的汇总统计数据:

1. Aggregate <- 
2. Original Dataset %>%
3. Group_By %>%
4. Filter %>%
5. Summarize %>%
6. Left_Join(back to Aggregate)

例如:

Original <- data.frame(A = 1:100,B = sample(LETTERS,100,replace = TRUE),C = rnorm(100))

# Calculate 1st Summary Statistic
Aggregate <- Original %>% group_by(B) %>%
filter(A > 50) %>%
summarize(meanC = mean(C))

# Calculate 2nd Summary Statistic
Aggregate <- Original %>% group_by(B) %>%
summarize(Q = sum(C)) %>%
left_join(x = Aggregate,y = Original,by = "B")

我的问题有两个:

A) 是否有更好的方法从另一个表构建汇总统计表?左连接感觉很笨重。

B) 执行此操作的“data.table”方法是什么,即我如何返回聚合表?

Aggregate[Aggregate[,meanC:=mean(C),by=.(B)]]

感谢任何建议...

最佳答案

如果在 group_by 而不是 summarize 之后进行变异,则可以避免连接。(警告:我不知道如何以这种方式进行过滤摘要统计。您可能希望事后取消分组以避免以后出现意外行为。)

library(tidyverse)
Original <- data.frame(A = 1:100,B = sample(LETTERS,100,replace = TRUE),C = rnorm(100))

# Calculate unfiltered summary statistic, as in OP
Aggregate_OP <- Original %>%
group_by(B) %>%
summarize(meanC = mean(C)) %>%
right_join(Original) %>%
select(A, B, C, meanC) # reorder columns
#> Joining, by = "B"

# Simpler, using mutate
Aggregate_mutate <- Original %>%
group_by(B) %>%
mutate(meanC = mean(C)) %>%
ungroup()

identical(Aggregate_OP, Aggregate_mutate)
#> [1] TRUE

关于r dplyr 和 data.table : aggregate then join back to original table,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52229249/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com