gpt4 book ai didi

r - 根据多个键列将缺失的行添加到 data.table

转载 作者:行者123 更新时间:2023-12-04 10:43:16 25 4
gpt4 key购买 nike

我有一个 data.table包含多个列的对象,这些列指定了独特的案例。在下面的小例子中,变量“name”、“job”和“sex”指定了唯一的ID。我想添加缺失的行,以便每个案例对于另一个变量的每个可能实例都有一行,“from”(类似于 expand.grid)。

library(data.table)
set.seed(1)
mydata <- data.table(name = c("john","john","john","john","mary","chris","chris","chris"),
job = c("teacher","teacher","teacher","teacher","police","lawyer","lawyer","doctor"),
sex = c("male","male","male","male","female","female","male","male"),
from = c("NYT","USAT","BG","TIME","USAT","BG","NYT","NYT"),
score = rnorm(8))

setkeyv(mydata, cols=c("name","job","sex"))

mydata[CJ(unique(name, job, sex), unique(from))]

这是当前的 data.table 对象:
> mydata
name job sex from score
1: john teacher male NYT -0.6264538
2: john teacher male USAT 0.1836433
3: john teacher male BG -0.8356286
4: john teacher male TIME 1.5952808
5: mary police female USAT 0.3295078
6: chris lawyer female BG -0.8204684
7: chris lawyer male NYT 0.4874291
8: chris doctor male NYT 0.7383247

这是我想要的结果:
> mydata
name job sex from score
1: john teacher male NYT -0.6264538
2: john teacher male USAT 0.1836433
3: john teacher male BG -0.8356286
4: john teacher male TIME 1.5952808
5: mary police female NYT NA
6: mary police female USAT 0.3295078
7: mary police female BG NA
8: mary police female TIME NA
9: chris lawyer female NYT -NA
10: chris lawyer female USAT -NA
11: chris lawyer female BG -0.8204684
12: chris lawyer female TIME -NA
13: chris lawyer male NYT 0.4874291
14: chris lawyer male USAT NA
15: chris lawyer male BG NA
16: chris lawyer male TIME NA
17: chris doctor male NYT 0.7383247
18: chris doctor male USAT NA
19: chris doctor male BG NA
20: chris doctor male TIME NA

这是我尝试过的:
setkeyv(mydata, cols=c("name","job","sex"))
mydata[CJ(unique(name, job, sex), unique(from))]

但是我收到以下错误,添加 fromLast=TRUE(或 FALSE)并没有给我正确的解决方案:
Error in unique.default(name, job, sex) : 
'fromLast' must be TRUE or FALSE

以下是我遇到的相关答案(但似乎没有一个可以处理多个键控列):
add missing rows to a data table

Efficiently inserting default missing rows in a data.table

Fastest way to add rows for missing values in a data.frame?

最佳答案

这里有几种可能性 - https://github.com/Rdatatable/data.table/pull/814

CJ.dt = function(...) {
rows = do.call(CJ, lapply(list(...), function(x) if(is.data.frame(x)) seq_len(nrow(x)) else seq_along(x)));
do.call(data.table, Map(function(x, y) x[y], list(...), rows))
}

setkey(mydata, name, job, sex, from)

mydata[CJ.dt(unique(data.table(name, job, sex)), unique(from))]
# name job sex from score
# 1: chris doctor male NYT 0.7383247
# 2: chris doctor male BG NA
# 3: chris doctor male TIME NA
# 4: chris doctor male USAT NA
# 5: chris lawyer female NYT NA
# 6: chris lawyer female BG -0.8204684
# 7: chris lawyer female TIME NA
# 8: chris lawyer female USAT NA
# 9: chris lawyer male NYT 0.4874291
#10: chris lawyer male BG NA
#11: chris lawyer male TIME NA
#12: chris lawyer male USAT NA
#13: john teacher male NYT -0.6264538
#14: john teacher male BG -0.8356286
#15: john teacher male TIME 1.5952808
#16: john teacher male USAT 0.1836433
#17: mary police female NYT NA
#18: mary police female BG NA
#19: mary police female TIME NA
#20: mary police female USAT 0.3295078

关于r - 根据多个键列将缺失的行添加到 data.table,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27372027/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com