gpt4 book ai didi

r data.table 函数包装器围绕临时连接(在链中聚合)

转载 作者:行者123 更新时间:2023-12-04 23:17:45 25 4
gpt4 key购买 nike

[data.table_1.9.6]问题的背景是我正在尝试在类似星型模式的数据布局中构建类似 olap 的查询功能,即一个大的事实表和几个元表。我正在围绕 data.table 连接构建函数包装器,然后在链中进行聚合:

# dummy data
dt1 = data.table(id = 1:5, x=letters[1:5], a=11:15, b=21:25)
dt2 = data.table(k=11:15, z=letters[11:15])

# standard data.table query with ad-hoc key -> works fine
dt1[dt2, c("z") := .(i.z), with = F,
on = c(a="k")][, .(m = sum(a, na.rm = T),
count = .N), by = c("z")]

# wrapper function with setkey -> works fine
agg_foo <- function(x, meta_tbl, x_key, meta_key, agg_var) {
setkeyv(x, x_key)
setkeyv(meta_tbl, meta_key)
x[meta_tbl, (agg_var) := get(agg_var)][,.(a_sum = sum(a, na.rm=T),
count = .N),
by = c(agg_var)]
x[, (agg_var) := .(NULL)]
}

# call function (works fine)
agg_foo(x=dt1, meta_tbl=dt2, x_key="a", meta_key="k",agg_var="z")

# wrapper function with ad-hoc key -> does not work
agg_foo_ad_hoc <- function(x, meta_tbl, x_key, meta_key, agg_var) {
x[meta_tbl, (agg_var) := get(agg_var),
on = c(x_key = meta_key)][,.(a_sum = sum(a, na.rm=T),
count = .N), by = c(agg_var)]
x[, (agg_var) := .(NULL)]
}

# call function (causes error)
agg_foo_ad_hoc(x=dt1, meta_tbl=dt2, x_key="a", meta_key="k",agg_var="z")

Error in forderv(x, by = rightcols) :
'by' value -2147483648 out of range [1,4]

我的猜测是我必须以不同的方式提供临时的“on”参数。我试过 = c(get(x_key) = meta_key) 但后来他提示意外的括号。我可以使用该函数的 setkey 版本,但我想知道这是否有效,因为该函数将根据使用的聚合属性在不同的元表上工作,从而不断地重新设置 key 。还是总是首选 setkey?实际事实表(此处为 x)有 > 3000 万行。

最佳答案

您需要做的就是构建一个带有正确标签的向量。这是一种方法:

agg_foo_ad_hoc <- function(x, meta_tbl, x_key, meta_key, agg_var) { 
x[meta_tbl, (agg_var) := get(agg_var),
on = setNames(meta_key, x_key)][,.(a_sum = sum(a, na.rm=T),
count = .N), by = c(agg_var)]
x[, (agg_var) := .(NULL)]
}

关于r data.table 函数包装器围绕临时连接(在链中聚合),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37706385/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com