gpt4 book ai didi

R数据.表: efficiently access and update a variable column name in j expression with grouping

转载 作者:行者123 更新时间:2023-12-02 19:21:06 25 4
gpt4 key购买 nike

我想对数据表中的列列表应用转换(其类型,宽松地说,是“向量” -> “向量”),并且此转换将涉及分组操作。

这是设置以及我想要实现的目标:

library(data.table)

set.seed(123)
n <- 1000
DT <- data.table(
date = seq.Date(as.Date('2000/1/1'), by='day', length.out = n),
A = runif(n),
B = rnorm(n),
C = rexp(n))

DT[, A.prime := (A - mean(A))/sd(A), by=year(date)]
DT[, B.prime := (B - mean(B))/sd(B), by=year(date)]
DT[, C.prime := (C - mean(C))/sd(C), by=year(date)]

目标是避免键入列名称。在我的实际应用程序中,我有一个想要应用此转换的列列表。

library(data.table)
set.seed(123)
n <- 1000
DT <- data.table(
date = seq.Date(as.Date('2000/1/1'), by='day', length.out = n),
A = runif(n),
B = rnorm(n),
C = rexp(n))

columns <- c("A", "B", "C")

for (x in columns) {
# This doesn't work.
# target <- DT[, (x - mean(x, na.rm=TRUE))/sd(x, na.rm = TRUE), by=year(date)]

# This doesn't work.
#target <- DT[, (..x - mean(..x, na.rm=TRUE))/sd(..x, na.rm = TRUE), by=year(date)]

# THIS WORKS! But it is tedious writing "get(x)" every time.
target <- DT[, (get(x) - mean(get(x), na.rm=TRUE))/sd(get(x), na.rm = TRUE), by=year(date)][, V1]

set(DT, j = paste0(x, ".prime"), value = target)
}

问题:实现上述结果的惯用方法是什么?有两件事可能需要改进:

  1. 如何避免输入 get(x)每次我使用 x访问列?
  2. 正在访问 [, V1]最有效的方法是什么?可以更新DT直接通过引用,而不创建中间数据表?

最佳答案

您可以使用.SDcols来指定要操作的列:

library(data.table)

columns <- c("A", "B", "C")
newcolumns <- paste0(columns, ".prime")

DT[, (newcolumns) := lapply(.SD, function(x) (x- mean(x))/sd(x)),
year(date), .SDcols = columns]

这可以避免每次都使用 get(x) 并通过引用更新 data.table

关于R数据.表: efficiently access and update a variable column name in j expression with grouping,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63047444/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com