gpt4 book ai didi

r - 从数据框计算每年的天数

转载 作者:行者123 更新时间:2023-12-04 09:42:33 25 4
gpt4 key购买 nike

我有一个与此类似的数据框:

df<-read.csv(text="id;census;startDate;endDate
ZF001;died;16.10.2012;16.05.2015
ZF002;alive;20.10.2013
ZF003;alive;04.11.2013;
ZF004;died;11.11.2013;20.12.2014
ZF005;died;25.11.2013;16.06.2015
ZF006;alive;25.11.2014;
ZF007;survived;02.12.2014;19.01.2015
ZF008;alive;11.12.2014;
ZF009;survived;28.01.2015;12.03.2015", sep=";")

df$startDate<-as.Date(df$startDate, "%d.%m.%Y")
df$endDate<-as.Date(df$endDate, "%d.%m.%Y")

我需要的是以下内容:一个包含先证者每年参与研究的天数的新数据框。它应该类似于:
id     year days
ZF001 2012 77
ZF001 2013 365
ZF001 2014 365
ZF001 2015 135
etc.

最佳答案

我假设您只希望对已故先证者使用此功能(因为活着的先证者没有结束日期),这是一个可能的 data.table几乎不言自明的解决方案

library(data.table)
setDT(df)[census == "died",
as.data.table(table(year(seq.Date(startDate, endDate, by = "day")))),
by = id]
# id V1 N
# 1: ZF001 2012 77
# 2: ZF001 2013 365
# 3: ZF001 2014 365
# 4: ZF001 2015 136
# 5: ZF004 2013 51
# 6: ZF004 2014 354
# 7: ZF005 2013 37
# 8: ZF005 2014 365
# 9: ZF005 2015 167

基本上我们计算从开始到结束日期的所有天数 id ,那么,我们使用的是 year函数以提取年份,然后仅计算频率

或等效的 dplyr解决方案
library(dplyr)
df %>%
group_by(id) %>%
filter(census=='died') %>%
do(as.data.frame(table(year(seq.Date(.$startDate, .$endDate, by ='day')))))

编辑 每条评论:
如果您希望对所有患者(死的或活的)使用此方法,而对于活着的患者,您希望使用 Sys.Date ,在这种情况下我们可以定义一个简单的辅助函数
dateFunc <- function(x, y){
if(is.na(y)) {
as.data.table(table(year(seq.Date(x, Sys.Date(), by = "day"))))
} else as.data.table(table(year(seq.Date(x, y, by = "day"))))
}

setDT(df)[, setNames(dateFunc(startDate, endDate), c("Year", "Days")), by = id]
# id Year Days
# 1: ZF001 2012 77
# 2: ZF001 2013 365
# 3: ZF001 2014 365
# 4: ZF001 2015 136
# 5: ZF002 2013 73
# 6: ZF002 2014 365
# 7: ZF002 2015 222
# 8: ZF003 2013 58
# 9: ZF003 2014 365
# 10: ZF003 2015 222
# 11: ZF004 2013 51
# 12: ZF004 2014 354
# 13: ZF005 2013 37
# 14: ZF005 2014 365
# 15: ZF005 2015 167
# 16: ZF006 2014 37
# 17: ZF006 2015 222
# 18: ZF007 2014 30
# 19: ZF007 2015 19
# 20: ZF008 2014 21
# 21: ZF008 2015 222
# 22: ZF009 2015 44

数据
df <- structure(list(id = structure(1:9, .Label = c("ZF001", "ZF002", 
"ZF003", "ZF004", "ZF005", "ZF006", "ZF007", "ZF008", "ZF009"
), class = "factor"), census = structure(c(2L, 1L, 1L, 2L, 2L,
1L, 3L, 1L, 3L), .Label = c("alive", "died", "survived"), class = "factor"),
startDate = structure(c(15629, 15998, 16013, 16020, 16034,
16399, 16406, 16415, 16463), class = "Date"), endDate = structure(c(16571,
NA, NA, 16424, 16602, NA, 16454, NA, 16506), class = "Date")), .Names = c("id",
"census", "startDate", "endDate"), row.names = c(NA, -9L), class = "data.frame")

关于r - 从数据框计算每年的天数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31915790/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com