gpt4 book ai didi

python - 计算数据帧中特定列(SUM、AVG、STDEV)的所有嵌套级别聚合

转载 作者:行者123 更新时间:2023-12-01 01:39:15 24 4
gpt4 key购买 nike

我有一张如下所示的表格(简化版):

col_A   col_B   col_C
A 37 2
B 28 7
C 10 5
D 11 5
E 99 4

我想得到一个表,其中包含 col_A 的每个级别的所有嵌套组合,并计算子组内的平均值:例如,选择任意 2 表看起来像(10 个唯一级别组合):
Grp_2   AVG (col_B/col_C)
A,B 7.76
A,C 6.61
A,D 7.55
… …
D,E 12.99

选择任意 4 看起来像(5 个独特的级别组合):
Grp_4   AVG (col_B/col_C)
A,B,C,D 7.84
A,B,C,E 6.68
A,C,D,E 7.63
… …
B,C,D,E 13.12

(顺序 od 偏好) R, SQL(postgres, ANSI) , Python.;
我当前的解决方案(如下)在 电话 col_A 的级别数相比没有很好的扩展性生长:
require(tidyverse)
df <- tibble(col_A=c("A", "B","C", "D", "E"), col_B=c(37,28,10,11,99), col_C=c(2,7,5,5,4))

nested_subgroup_agg <- function(choice = 2, mydf = NULL) {
library(tidyverse)
dfx <-
combn(c("A", "B", "C", "D", "E"), choice) %>%
t() %>%
as_tibble()
try(if (choice <= 1) {
stop("Can't Choose less than 2 levels at a time")
}
else{
if (choice == 2) {
val <- map_dbl(1:nrow(dfx), function(i) {
(mydf$col_B[mydf$col_A == dfx$V1[i]] + mydf$col_B[mydf$col_A == dfx$V2[i]]) /
(mydf$col_C[mydf$col_A == dfx$V1[i]] + mydf$col_C[mydf$col_A == dfx$V2[i]])
})
}
else{
if (choice == 3) {
val <- map_dbl(1:nrow(dfx), function(i) {
(mydf$col_B[mydf$col_A == dfx$V1[i]] + mydf$col_B[mydf$col_A == dfx$V2[i]] + mydf$col_B[mydf$col_A == dfx$V3[i]]) /
(mydf$col_C[mydf$col_A == dfx$V1[i]] + mydf$col_C[mydf$col_A == dfx$V2[i]] + mydf$col_C[mydf$col_A == dfx$V3[i]])
})
}
else{
if (choice == 4) {
val <- map_dbl(1:nrow(dfx), function(i) {
(mydf$col_B[mydf$col_A == dfx$V1[i]] + mydf$col_B[mydf$col_A == dfx$V2[i]] + mydf$col_B[mydf$col_A == dfx$V3[i]] + mydf$col_B[mydf$col_A == dfx$V4[i]]) /
(mydf$col_C[mydf$col_A == dfx$V1[i]] + mydf$col_C[mydf$col_A == dfx$V2[i]] + mydf$col_C[mydf$col_A == dfx$V3[i]] + mydf$col_C[mydf$col_A == dfx$V4[i]])
})
}
}
}
})
dfx$val <- val
dfx
}
## Example
df <-
tibble(
col_A = c("A", "B", "C", "D", "E"),
col_B = c(37, 28, 10, 11, 99),
col_C = c(2, 7, 5, 5, 4)
)
nested_subgroup_agg(choice = 4, mydf = df)

你能帮忙改进吗?

最佳答案

一个想法是使用 combn获取行的所有组合(考虑到每行有 1 个字母),然后每 2 行简单地聚合一次,即

#get a df with all combination of rows
new_d <- dd[c(combn(nrow(dd), 2)),]

#Aggregate
#You can use `aggregate` or `lapply(split())`
lapply(split(new_d, rep(seq((nrow(new_d)) / 2), each = 2)), function(i)sum(i$col_C))

数据
dput(dd)
structure(list(col_A = structure(1:5, .Label = c("A", "B", "C",
"D", "E"), class = "factor"), col_B = c(37L, 28L, 10L, 11L, 99L
), col_C = c(2L, 7L, 5L, 5L, 4L)), class = "data.frame", row.names = c(NA,
-5L))

关于python - 计算数据帧中特定列(SUM、AVG、STDEV)的所有嵌套级别聚合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59881213/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com