gpt4 book ai didi

r - 用 R data.table 填充缺失的行

转载 作者:行者123 更新时间:2023-12-04 10:41:46 27 4
gpt4 key购买 nike

我在 R 中有一个 data.table,它是从如下所示的数据库中获取的:

date,identifier,description,location,value1,value2
2014-03-01,1,foo,1,100,200
2014-03-01,1,foo,2,200,300
2014-04-01,1,foo,1,100,200
2014-04-01,1,foo,2,100,200
2014-05-01,1,foo,1,100,200
2014-05-01,1,foo,2,100,200
2014-03-01,2,bar,1,100,200
2014-04-01,2,bar,1,100,200
2014-05-01,2,bar,1,100,200
2014-03-01,3,baz,1,100,200
2014-03-01,3,baz,2,200,300
2014-04-01,3,baz,1,100,200
2014-04-01,3,baz,2,100,200
2014-05-01,3,baz,1,100,200
2014-05-01,3,baz,2,100,200
2014-05-01,4,quux,2,100,200
<SNIP>

为了对数据进行一些计算,我想对其进行按摩,以便日期、标识符、描述和位置的每个组合在表中都有一行,NA 为 value1 和 value2。我知道日期范围和位置的所有潜在值。

我是 R 和 data.table 的新手,此时我的头脑很困惑。我想为上述示例表得出的结果是:
date,identifier,description,location,value1,value2
2014-03-01,1,foo,1,100,200
2014-03-01,1,foo,2,200,300
2014-04-01,1,foo,1,100,200
2014-04-01,1,foo,2,100,200
2014-05-01,1,foo,1,100,200
2014-05-01,1,foo,2,100,200
2014-03-01,2,bar,1,100,200
2014-03-01,2,bar,2,NA,NA
2014-04-01,2,bar,1,100,200
2014-04-01,2,bar,2,NA,NA
2014-05-01,2,bar,1,100,200
2014-05-01,2,bar,2,NA,NA
2014-03-01,3,baz,1,100,200
2014-03-01,3,baz,2,200,300
2014-04-01,3,baz,1,100,200
2014-04-01,3,baz,2,100,200
2014-05-01,3,baz,1,100,200
2014-05-01,3,baz,2,100,200
2014-03-01,4,quux,1,NA,NA
2014-03-01,4,quux,2,NA,NA
2014-04-01,4,quux,1,NA,NA
2014-04-01,4,quux,2,NA,NA
2014-05-01,4,quux,1,NA,NA
2014-05-01,4,quux,2,100,200

数据库中的数据是稀疏的,因为给定的标识符/描述/位置组合对于每个日期可能有任意数量的条目或根本没有条目。我想在给定的日期范围内(例如,2014-03-01 到 2014-05-01)每个标识符/描述和位置在表中都有一行。

这似乎是一个有趣的 data.table 技巧要做的事情,但我正在空白。

编辑:我通过合并另一个数据表对一个标识符/描述进行了较小规模的此操作,但我不确定如何在增加多个标识符/描述和位置的复杂性的情况下执行此操作。

非常感谢您的回复。

这是原始数据的 dput 输出,可以很容易地复制到 R 中:
structure(list(date = structure(c(1L, 1L, 2L, 2L, 3L, 3L, 1L, 2L, 3L, 1L, 1L, 2L, 2L, 3L, 3L, 3L), 
.Label = c("2014-03-01", "2014-04-01", "2014-05-01"), class = "factor"),
identifier = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L),
description = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 4L),
.Label = c("bar", "baz", "foo", "quux"), class = "factor"),
location = c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L),
value1 = c(100L, 200L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 200L, 100L, 100L, 100L, 100L, 100L),
value2 = c(200L, 300L, 200L, 200L, 200L, 200L, 200L, 200L, 200L, 200L, 300L, 200L, 200L, 200L, 200L, 200L)),
.Names = c("date", "identifier", "description", "location", "value1", "value2"),
row.names = c(NA, -16L),
class = c("data.table", "data.frame"))

最佳答案

在@akrun 和@eddi 的帮助下,这是惯用的 (?) 方式:

mycols  = c("description","date","location")
setkeyv(DT0,mycols)
DT1 <- DT0[J(do.call(CJ,lapply(mycols,function(x)unique(get(x)))))]
# alternately: DT1 <- DT0[DT0[,do.call(CJ,lapply(.SD,unique)),.SDcols=mycols]]
identifier新行缺少列,但可以填充:
setkey(DT1,description)
DT1[unique(DT0[,c("description","identifier")]),identifier:=i.identifier]

关于r - 用 R data.table 填充缺失的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30220354/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com