gpt4 book ai didi

复制后data.table中的引用问题

转载 作者:行者123 更新时间:2023-12-03 14:22:45 26 4
gpt4 key购买 nike

我有一个关于 data.table 中的引用分配的复杂问题列嵌套在另一个 data.table .我能够在下面的可重现示例中重现该行为。

对不起,它仍然很长,需要一些时间才能完全理解,但它是我能够产生的更短的时间来指出我的问题。

假设我创建了以下 data.table命名 data_1包含类型为 data.table 的单个列:

library(data.table)

set.seed(20200602L)

data_1 <- data.table(
foo = replicate(5L, {
data.table(
bar = lapply(sample(3L, 5L, replace=TRUE), rpois, 1)
)
}, simplify=FALSE)
)

data_1[]
## foo
## 1: <data.table>
## 2: <data.table>
## 3: <data.table>
## 4: <data.table>
## 5: <data.table>

可以探索专栏 foo的内容以下 :
data_1[, foo]
## [[1]]
## bar
## 1: 4,0,1
## 2: 0,2
## 3: 1,3,2
## 4: 1,1
## 5: 0
##
## [[2]]
## bar
## 1: 2
## 2: 0,3
## 3: 0
## 4: 2,3
## 5: 0,0
##
## [[3]]
## bar
## 1: 0,1,1
## 2: 1,2,1
## 3: 2,1
## 4: 1
## 5: 1
##
## [[4]]
## bar
## 1: 1
## 2: 3,3
## 3: 0
## 4: 2,2
## 5: 0,0,0
##
## [[5]]
## bar
## 1: 0,0
## 2: 0,0
## 3: 0,1
## 4: 2,1
## 5: 0

然后我想创建一个函数 fun()这将添加一列 baz到列中的每个元素 foo .本栏目 baz将镜像 bar 中的列表如下所示 :
fun <- function(data) {

data[, .(lapply(foo, function(x) {
x[, baz:=lapply(bar, function(y) {
rev(y)
})]
}))]

}

在将该功能应用于 data_1 之前,我复制到 data_2因为我需要保持原件完好无损。
data_2 <- copy(data_1)

invisible(fun(data_1))

data_1[, foo]
## [[1]]
## bar baz
## 1: 4,0,1 1,0,4
## 2: 0,2 2,0
## 3: 1,3,2 2,3,1
## 4: 1,1 1,1
## 5: 0 0
##
## [[2]]
## bar baz
## 1: 2 2
## 2: 0,3 3,0
## 3: 0 0
## 4: 2,3 3,2
## 5: 0,0 0,0
##
## [[3]]
## bar baz
## 1: 0,1,1 1,1,0
## 2: 1,2,1 1,2,1
## 3: 2,1 1,2
## 4: 1 1
## 5: 1 1
##
## [[4]]
## bar baz
## 1: 1 1
## 2: 3,3 3,3
## 3: 0 0
## 4: 2,2 2,2
## 5: 0,0,0 0,0,0
##
## [[5]]
## bar baz
## 1: 0,0 0,0
## 2: 0,0 0,0
## 3: 0,1 1,0
## 4: 2,1 1,2
## 5: 0 0

可以仔细检查 data_2仍然完好无损:
data_2[, foo]
## [[1]]
## bar
## 1: 4,0,1
## 2: 0,2
## 3: 1,3,2
## 4: 1,1
## 5: 0
##
## [[2]]
## bar
## 1: 2
## 2: 0,3
## 3: 0
## 4: 2,3
## 5: 0,0
##
## [[3]]
## bar
## 1: 0,1,1
## 2: 1,2,1
## 3: 2,1
## 4: 1
## 5: 1
##
## [[4]]
## bar
## 1: 1
## 2: 3,3
## 3: 0
## 4: 2,2
## 5: 0,0,0
##
## [[5]]
## bar
## 1: 0,0
## 2: 0,0
## 3: 0,1
## 4: 2,1
## 5: 0

到那时,一切看起来都很好。但是,假设我改变主意,我想应用函数 fun()data_2以及。我原以为它会像 data_1 一样工作.不幸的是,它不是:
invisible(fun(data_2))
## Warning messages:
## 1: In `[.data.table`(x, , `:=`(baz, lapply(bar, function(y) { :
## Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved.
## 2: In `[.data.table`(x, , `:=`(baz, lapply(bar, function(y) { :
## Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved.
## 3: In `[.data.table`(x, , `:=`(baz, lapply(bar, function(y) { :
## Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved.
## 4: In `[.data.table`(x, , `:=`(baz, lapply(bar, function(y) { :
## Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved.
## 5: In `[.data.table`(x, , `:=`(baz, lapply(bar, function(y) { :
## Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved.

data_2[, foo]
## [[1]]
## bar
## 1: 4,0,1
## 2: 0,2
## 3: 1,3,2
## 4: 1,1
## 5: 0
##
## [[2]]
## bar
## 1: 2
## 2: 0,3
## 3: 0
## 4: 2,3
## 5: 0,0
##
## [[3]]
## bar
## 1: 0,1,1
## 2: 1,2,1
## 3: 2,1
## 4: 1
## 5: 1
##
## [[4]]
## bar
## 1: 1
## 2: 3,3
## 3: 0
## 4: 2,2
## 5: 0,0,0
##
## [[5]]
## bar
## 1: 0,0
## 2: 0,0
## 3: 0,1
## 4: 2,1
## 5: 0

有人可以解释我为什么,也许可以指出我解决问题的方法吗?

引用
sessionInfo()
## R version 4.0.0 (2020-04-24)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: SUSE Linux Enterprise Server 12 SP5
##
## Matrix products: default
## BLAS: /apps/R-4.0.0/lib/libRblas.so
## LAPACK: /apps/R-4.0.0/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] data.table_1.12.8
##
## loaded via a namespace (and not attached):
## [1] compiler_4.0.0 tools_4.0.0

最佳答案

.internal.selfref没有被 copy 更新为成分data.table s:

all.equal(
lapply(data_1$foo, attr, '.internal.selfref'),
lapply(data_2$foo, attr, '.internal.selfref')
)
# [1] TRUE

这需要更新;您可以通过运行 alloc.col 来解决此问题在复制 data.table s:
data_2 = copy(data_1)
# also possible to do lapply(foo, copy), but this should be slower
data_2[ , foo := lapply(foo, alloc.col)]

invisible(fun(data_1))

invisible(fun(data_2))

关于复制后data.table中的引用问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62164617/

26 4 0