gpt4 book ai didi

r - 如何对 data.frame 进行子集化?

转载 作者:行者123 更新时间:2023-12-01 07:25:41 25 4
gpt4 key购买 nike

我有一个这样的数据集

a <- data.frame(var1 = c("patientA", "patientA", "patientA", "patientB", "patientB", "patientB", "patientB"),
var2 = as.Date(c("2015-01-02","2015-01-04","2015-02-02","2015-02-06","2015-01-02","2015-01-07","2015-04-02")),
var3 = c(F, T, F, F, F, T, F)
)
sequ <- rle(as.character(a$var1))
a$sequ <- sequence(sequ$lengths)

生产
> a
var1 var2 var3 sequ
1 patientA 2015-01-02 FALSE 1
2 patientA 2015-01-04 TRUE 2
3 patientA 2015-02-02 FALSE 3
4 patientB 2015-02-06 FALSE 1
5 patientB 2015-01-02 FALSE 2
6 patientB 2015-01-07 TRUE 3
7 patientB 2015-04-02 FALSE 4

如何子集/过滤此数据集,以便获得 var3 == TRUE 和 var2 日期值大于 var3 == TRUE 的行中的所有行(患者,var1?我试过
subset(a, (var3 == TRUE) & (var2 > var3))

但这不会产生正确的结果集。正确的是
#       var1       var2  var3 sequ
# 1 patientA 2015-01-04 TRUE 2
# 2 patientA 2015-02-02 FALSE 3
# 3 patientB 2015-02-06 FALSE 1
# 4 patientB 2015-01-07 TRUE 3
# 5 patientB 2015-04-02 FALSE 4

最佳答案

你可以试试 data.table .在这里,我们将“data.frame”转换为“data.table”(setDT(a)),按“var1”分组,我们得到大于或等于相应“var2”元素的“var2”元素的逻辑索引'var3' 为 TRUE 并对数据集进行子集 .SD .

library(data.table)
setDT(a)[,.SD[var2 >= var2[var3]], var1]
# var1 var2 var3 sequ
#1: patientA 2015-01-04 TRUE 2
#2: patientA 2015-02-02 FALSE 3
#3: patientB 2015-02-06 FALSE 1
#4: patientB 2015-01-07 TRUE 3
#5: patientB 2015-04-02 FALSE 4

使用 base R 的选项(假设数据按'var1'排序)
a[with(a, var2>=rep(var2[var3], table(var1))),]
# var1 var2 var3 sequ
#2 patientA 2015-01-04 TRUE 2
#3 patientA 2015-02-02 FALSE 3
#4 patientB 2015-02-06 FALSE 1
#6 patientB 2015-01-07 TRUE 3
#7 patientB 2015-04-02 FALSE 4

关于r - 如何对 data.frame 进行子集化?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30036905/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com