gpt4 book ai didi

r - 如何根据条件将连续的行合并为一行

转载 作者:行者123 更新时间:2023-12-04 14:22:38 28 4
gpt4 key购买 nike

我有一个数据框,其中包含带有患者 ID 和日期的入院事件。

问题

我想合并 HospNum_Id 与前一行相同且两行之间的日期差异 >3 天的任何行。

输入

这里显示了一个合成数据集:

structure(list(HospNum_Id = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L), .Label = c("A791697", "V682805", "X608693"
), class = "factor"), VisitDate = structure(c(17181, 17183, 17192,
17168, 17169, 17186, 17189, 17212, 17215, 17167, 17173, 17190
), class = "Date"), diffDate = structure(c(-2, -9, NA, -1, -17,
-3, -23, -3, NA, -6, -17, NA), class = "difftime", units = "days")), .Names = c("HospNum_Id",
"VisitDate", "diffDate"), row.names = c(NA, -12L), class = "data.frame")

我的尝试

我采取的步骤是

1。对列进行排序

Mydf<-Mydf[order(Mydf$HospNum_Id,Mydf$VisitDate),]

2。添加日期差异列

library(rlang)
library(dplyr)

SurveilTimeByRow <-
function(Mydf, HospNum_Id, VisitDate) {
HospNum_Ida <- sym(HospNum_Id)
VisitDatea <- sym(VisitDate)
ret<-dataframe %>% arrange(!!HospNum_Ida,!!VisitDatea) %>%
group_by(!!HospNum_Ida) %>%
mutate(diffDate = difftime(as.Date(!!VisitDatea), lead(as.Date(
!!VisitDatea
), 1), units = "days"))
dataframe<-data.frame(ret)
return(dataframe)
}

Mydf<-SurveilTimeByRow(try,"HospNum_Id","VisitDate")

3。如果该行的 dateDiff 为 >=-3 或 <=3

,则将该行添加到上一行

这是我坚持的部分。

要求的输出

HospNum_Id  VisitDate       diffDate   HospNum_Id.1  VisitDate.1       diffDate.1
A791697 2017-01-15 -2 days A791697 2017-01-17 -9 days
V682805 2017-01-02 -1 days V682805 2017-01-03 -17 days
V682805 2017-01-20 -3 days V682805 2017-01-23 -23 days
V682805 2017-02-15 -3 days V682805 2017-02-18 NA days

我将去掉最后一列 difftime.1,它最终将是多余的

最佳答案

这是一种可能的解决方案,使用您作为 df 发布的数据:

library(tidyverse)

# create an id to flag consecutive rows within each HospNum
df %>%
group_by(HospNum_Id) %>%
mutate(id = ceiling(row_number() / 2)) %>%
ungroup() -> df2

# split to even and odd rows within each HospNum
df_odd = df2 %>% group_by(HospNum_Id) %>% filter(row_number() %in% seq(1, nrow(df2), 2)) %>% ungroup()
df_even = df2 %>% group_by(HospNum_Id) %>% filter(row_number() %in% seq(2, nrow(df2), 2)) %>% ungroup()

# join on both ids and remove rows
inner_join(df_odd, df_even, by=c("id","HospNum_Id")) %>%
filter(between(diffDate.x, -3, 3) & !is.na(diffDate.y)) %>%
select(-id)

# # A tibble: 3 x 5
# HospNum_Id VisitDate.x diffDate.x VisitDate.y diffDate.y
# <fct> <date> <time> <date> <time>
# 1 A791697 2017-01-15 -2 days 2017-01-17 " -9 days"
# 2 V682805 2017-01-02 -1 days 2017-01-03 -17 days
# 3 V682805 2017-01-20 -3 days 2017-01-23 -23 days

您可以像这样将上述逻辑组合到一个管道链中:

df %>%
group_by(HospNum_Id) %>%
mutate(id = ceiling(row_number() / 2),
even_row = row_number() %in% seq(2, nrow(df), 2)) %>%
ungroup() %>%
nest(-even_row) %>%
pull(data) %>%
reduce(function(x,y) inner_join(x,y,by=c("id","HospNum_Id"))) %>%
filter(between(diffDate.x, -3, 3) & !is.na(diffDate.y)) %>%
select(-id)

关于r - 如何根据条件将连续的行合并为一行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52407068/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com