gpt4 book ai didi

r - 涉及因子的数据表赋值

转载 作者:行者123 更新时间:2023-12-04 12:54:52 24 4
gpt4 key购买 nike

我正在使用 data.table (1.8.9) 和 :=运算符从另一个表中的值更新一个表中的值。要更新的表 (dt1) 有许多因子列,更新后的表 (dt2) 具有相似的列,其中的值可能不存在于其他表中。如果 dt2 中的列是字符,我会收到一条错误消息,但是当我分解它们时,我会得到不正确的值。

如何在不先将所有因子转换为字符的情况下更新表?

这是一个简化的示例:

library(data.table)

set.seed(3957)

## Create some sample data
## Note column y is a factor
dt1<-data.table(x=1:10,y=factor(sample(letters,10)))
dt1

## x y
## 1: 1 m
## 2: 2 z
## 3: 3 t
## 4: 4 b
## 5: 5 l
## 6: 6 a
## 7: 7 s
## 8: 8 y
## 9: 9 q
## 10: 10 i

setkey(dt1,x)

set.seed(9068)

## Create a second table that will be used to update the first one.
## Note column y is not a factor
dt2<-data.table(x=sample(1:10,5),y=sample(letters,5))
dt2

## x y
## 1: 2 q
## 2: 7 k
## 3: 3 u
## 4: 6 n
## 5: 8 t

## Join the first and second tables on x and attempt to update column y
## where there is a match
dt1[dt2,y:=i.y]

## Error in `[.data.table`(dt1, dt2, `:=`(y, i.y)) :
## Type of RHS ('character') must match LHS ('integer'). To check and
## coerce would impact performance too much for the fastest cases. Either
## change the type of the target column, or coerce the RHS of := yourself
## (e.g. by using 1L instead of 1)

## Create a third table that is the same as the second, except y
## is also a factor
dt3<-copy(dt2)[,y:=factor(y)]

## Join the first and third tables on x and attempt to update column y
## where there is a match
dt1[dt3,y:=i.y]
dt1

## x y
## 1: 1 m
## 2: 2 i
## 3: 3 m
## 4: 4 b
## 5: 5 l
## 6: 6 b
## 7: 7 a
## 8: 8 l
## 9: 9 q
## 10: 10 i

## No error message this time, but it is using the levels and not the labels
## from dt3. For example, row 2 should be q but it is i.

data.table help file的第3页说:

When LHS is a factor column and RHS is a character vector with items missing from the factor levels, the new level(s) are automatically added (by reference, efficiently), unlike base methods.



这使得我尝试过的看起来应该可行,但显然我错过了一些东西。我想知道这是否与这个类似的问题有关:

rbindlist two data.tables where one has factor and other has character type for a column

最佳答案

这是一个解决方法:

dt1[dt2, z := i.y][!is.na(z), y := z][, z := NULL]

请注意 z是一个字符列,第二个分配按预期工作,不太确定为什么 OP 没有。

关于r - 涉及因子的数据表赋值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17538617/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com