gpt4 book ai didi

r - 是否有与 plyr::join_all 等效的 dplyr 或 data.table?通过数据框列表加入?

转载 作者:行者123 更新时间:2023-12-01 22:22:07 25 4
gpt4 key购买 nike

鉴于此data.frame:

set.seed(4)
df <- data.frame(x = rep(1:5, each = 2), y = sample(50:100, 10, T))
# x y
# 1 1 78
# 2 1 53
# 3 2 93
# 4 2 96
# 5 3 61
# 6 3 82
# 7 4 53
# 8 4 76
# 9 5 91
# 10 5 99

我想编写一些简单的函数(即特征工程)来为 x 创建特征,然后将每个生成的 data.frames 连接在一起。例如:

library(dplyr)
count_x <- function(df) df %>% group_by(x) %>% summarise(count_x = n())
sum_y <- function(df) df %>% group_by(x) %>% summarise(sum_y = sum(y))
mean_y <- function(df) df %>% group_by(x) %>% summarise(mean_y = mean(y))
# and many more...

这可以通过 plyr::join_all 来完成,但我想知道是否有更好(或更高性能)的方法使用 dplyrdata.table

df_with_features <- plyr::join_all(list(count_x(df), sum_y(df), mean_y(df)),
by = 'x', type = 'full')

# > df_with_features
# x count_x sum_y mean_y
# 1 1 2 131 65.5
# 2 2 2 189 94.5
# 3 3 2 143 71.5
# 4 4 2 129 64.5
# 5 5 2 190 95.0

最佳答案

将 @SimonOHanlon 的 data.table 方法与 @Jaap 的 Reducemerge 技术相结合似乎会产生最高性能的结果:

library(data.table)
setDT(df)
count_x_dt <- function(dt) dt[, list(count_x = .N), keyby = x]
sum_y_dt <- function(dt) dt[, list(sum_y = sum(y)), keyby = x]
mean_y_dt <- function(dt) dt[, list(mean_y = mean(y)), keyby = x]

Reduce(function(...) merge(..., all = TRUE, by = c("x")),
list(count_x_dt(df), sum_y_dt(df), mean_y_dt(df)))

更新以包含 tidyverse/purrr (purrr::reduce) 方法:

library(tidyverse)
list(count_x(df), sum_y(df), mean_y(df)) %>%
reduce(left_join)

关于r - 是否有与 plyr::join_all 等效的 dplyr 或 data.table?通过数据框列表加入?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33895570/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com