gpt4 book ai didi

r - 在为某些值操作时通过重叠时间段加入

转载 作者:行者123 更新时间:2023-12-04 07:39:00 26 4
gpt4 key购买 nike

我正在尝试加入一个像这样的时期数据库:

id = c(rep(1,3), rep(2,3), rep(3,3))
start = as.Date(c("2014-07-01", "2015-03-12", "2016-08-13", "2014-07-01", "2015-03-12", "2016-08-13", "2014-07-01", "2015-03-12", "2016-08-13"))
end = as.Date(c("2015-03-11", "2015-08-12", "2018-12-31", "2015-03-11", "2015-08-12", "2018-12-31","2015-03-11", "2015-08-12", "2018-12-31"))

DT = data.table(id, start, end)

DT

id start end
1: 1 2014-07-01 2015-03-11
2: 1 2015-03-12 2015-08-12
3: 1 2016-08-13 2018-12-31
4: 2 2014-07-01 2015-03-11
5: 2 2015-03-12 2015-08-12
6: 2 2016-08-13 2018-12-31
7: 3 2014-07-01 2015-03-11
8: 3 2015-03-12 2015-08-12
9: 3 2016-08-13 2018-12-31
有一个像这样的临床登记(体重和高度):
id_clin = (c(rep(1,2), rep (2,3), rep(3,4)))
date = as.Date(c("2014-10-23", "2016-09-01", "2017-01-01", "2014-08-01", "2015-02-01", "2017-06-01", "2018-03-05", "2018-09-01", "2018-11-30"))
weight = c(60, 65, 62, 75, 68, 90 , 102, 104 , 98 )
height = c(160, 160, 170, 175, 170, 200, 200, 200 ,200)

DT_clin = data.table(id_clin, date, weight, height)

DT_clin

id_clin date weight height
1: 1 2014-10-23 60 160
2: 1 2016-09-01 65 160
3: 2 2017-01-01 62 170
4: 2 2014-08-01 75 175
5: 2 2015-02-01 68 170
6: 3 2017-06-01 90 200
7: 3 2018-03-05 102 200
8: 3 2018-09-01 104 200
9: 3 2018-11-30 98 200
  • 当一个 id 的临床测量 (DT_clin) 的注册表位于同一 id 的一个周期 (DT) 的开始和结束之间时,必须连接注册表的值。
  • 如果在 DT 的周期之间 DT_clin 中没有值,则无需加入任何内容。
  • 如果 DT 周期之间有多个值,我想计算重叠值的平均值。

  • 期望的结果 看起来像这样*:
       id      start        end       date       date2       weight       height
    1: 1 2014-07-01 2015-03-11 2014-10-23 2014-10-23 60.0 160.0
    2: 1 2015-03-12 2015-08-12 <NA> <NA> NA NA
    3: 1 2016-08-13 2018-12-31 2016-09-01 2016-09-01 65.0 160.0
    4: 2 2014-07-01 2015-03-11 2014-08-01 2015-02-01 71.5 172.5
    5: 2 2015-03-12 2015-08-12 <NA> <NA> NA NA
    6: 2 2016-08-13 2018-12-31 2017-01-01 2017-01-01 62.0 170.0
    7: 3 2014-07-01 2015-03-11 <NA> <NA> NA NA
    8: 3 2015-03-12 2015-08-12 <NA> <NA> NA NA
    9: 3 2016-08-13 2018-12-31 2018-03-05 2018-11-30 101.3 200.0
    另外,如果有一种方法可以对不同的变量进行多个操作,我也会有兴趣知道一种方法。 (例如,在我进行连接的同时计算重量的平均值和高度的最大值)
    当只有一个值时,我已经测试了 foverlaps 并获得了良好的结果,但是当有多个值重叠时,我无法实现我的目标:
    setkey(DT, id, start, end)
    setkey(DT_clin, id_clin, date, date2)

    foverlaps(DT[id == "1", ], DT_clin[id == "1",], by.x =c("id", "start", "end") , by.y = c("id_clin", "date", "date2" ), nomatch = NA )
    我应该使用非等值联接吗?
    预先感谢您的任何帮助 :)
    *我复制了日期来创建 date2 并伪造了一个时间间隔

    最佳答案

    foverlaps :

    library(data.table)
    setkey(DT_clin, id_clin, date, date2)

    foverlaps(DT, DT_clin,
    by.x =c("id", "start", "end"),
    by.y = c("id_clin", "date", "date2" ), nomatch = NA )[
    ,.(datemin = min(date),
    datemax = max(date),
    weight = mean(weight,na.r=T),
    height = mean(height,na.rm=T)),
    by=.(id,start,end)]

    id start end datemin datemax weight height
    1: 1 2014-07-01 2015-03-11 2014-10-23 2014-10-23 60.0 160.0
    2: 1 2015-03-12 2015-08-12 <NA> <NA> NaN NaN
    3: 1 2016-08-13 2018-12-31 2016-09-01 2016-09-01 65.0 160.0
    4: 2 2014-07-01 2015-03-11 2014-08-01 2015-02-01 71.5 172.5
    5: 2 2015-03-12 2015-08-12 <NA> <NA> NaN NaN
    6: 2 2016-08-13 2018-12-31 2017-01-01 2017-01-01 62.0 170.0
    7: 3 2014-07-01 2015-03-11 <NA> <NA> NaN NaN
    8: 3 2015-03-12 2015-08-12 <NA> <NA> NaN NaN
    9: 3 2016-08-13 2018-12-31 2017-06-01 2018-11-30 98.5 200.0

    关于r - 在为某些值操作时通过重叠时间段加入,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67588712/

    26 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com