gpt4 book ai didi

r - 当我通过引用分配所有列时,为什么 data.table 会强制转换列类

转载 作者:行者123 更新时间:2023-12-04 14:14:04 25 4
gpt4 key购买 nike

这是我不明白的地方 data.table如果我选择一行并尝试将此行的所有值设置为 NA新的 line-data.table 被强制为逻辑

#Here is a sample table
DT <- data.table(a=rep(1L,3),b=rep(1.1,3),d=rep('aa',3))
DT
# a b d
# 1: 1 1.1 aa
# 2: 1 1.1 aa
# 3: 1 1.1 aa

#Here I extract a line, all the column types are kept... good
str(DT[1])
# Classes ‘data.table’ and 'data.frame': 1 obs. of 3 variables:
# $ a: int 1
# $ b: num 1.1
# $ d: chr "aa"
# - attr(*, ".internal.selfref")=<externalptr>

#Now here I want to set them all to `NA`...they all become logicals => WHY IS THAT ?
str(DT[1][,colnames(DT) := NA])
# Classes ‘data.table’ and 'data.frame': 1 obs. of 3 variables:
# $ a: logi NA
# $ b: logi NA
# $ d: logi NA
# - attr(*, ".internal.selfref")=<externalptr>
编辑:我认为这是一个错误
str(DT[1][ , a := NA])
# Classes ‘data.table’ and 'data.frame': 1 obs. of 3 variables:
# $ a: logi NA
# $ b: num 1.1
# $ d: chr "aa"
# - attr(*, ".internal.selfref")=<externalptr>

str(DT[1:2][ , a := NA])
# Classes ‘data.table’ and 'data.frame': 2 obs. of 3 variables:
# $ a: int NA NA
# $ b: num 1.1 1.1
# $ d: chr "aa" "aa"
# - attr(*, ".internal.selfref")=<externalptr>

最佳答案

提供答案,来自 ?":=" :

Unlike <- for data.frame, the (potentially large) LHS is not coerced to match the type of the (often small) RHS. Instead the RHS is coerced to match the type of the LHS, if necessary. Where this involves double precision values being coerced to an integer column, a warning is given (whether or not fractional data is truncated). The motivation for this is efficiency. It is best to get the column types correct up front and stick to them. Changing a column type is possible but deliberately harder: provide a whole column as the RHS. This RHS is then plonked into that column slot and we call this plonk syntax, or replace column syntax if you prefer. By needing to construct a full length vector of a new type, you as the user are more aware of what is happening, and it's clearer to readers of your code that you really do intend to change the column type.



当然,所有这些的动机是大表(比如 RAM 中的 10GB)。不是 1 或 2 行表。

更简单地说:如果 length(RHS) == nrow(DT)然后将 RHS(及其类型)插入该列插槽。即使这些长度是 1。如果 length(RHS) < nrow(DT) ,列(及其类型)的内存保留在原位,但 RHS 被强制和回收以替换该列中的(子集)项目。

如果我需要在大表中更改列的类型,我会这样写:
DT[, col := as.numeric(col)]

这里 as.numeric分配一个新向量,将“col”强制转换为该新内存,然后将其放入列槽中。它尽可能高效。之所以这么说是因为 length(RHS) == nrow(DT) .

如果要使用包含某些默认值的不同类型覆盖列:
DT[, col := rep(21.5, nrow(DT))]    # i.e., deliberately harder

如果“col”之前是整数类型,那么它将更改为每行包含 21.5 的数字类型。否则只是 DT[, col := 21.5]将导致关于 21.5 被强制为 21 的警告(除非 DT 只有 1 行!)

关于r - 当我通过引用分配所有列时,为什么 data.table 会强制转换列类,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18594017/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com