gpt4 book ai didi

r - 分箱数据的生存

转载 作者:行者123 更新时间:2023-12-04 02:09:25 24 4
gpt4 key购买 nike

我已将要对以下示例数据进行生存分析的数据分箱。 n是每个组、时间、故障指示器组合的单元计数。

> df <- structure(list(group = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("", "A", "B"), class = "factor"), t = c(0L, 1L, 2L, 3L, 1L, 2L, 3L, 0L, 1L, 2L, 3L, 1L, 2L, 3L), failure = c(0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), n = c(40000L, 30000L, 20000L, 10000L, 5L, 4L, 3L, 20000L, 15000L, 14000L, 11000L, 10L, 6L, 4L)), .Names = c("group", "t", "failure", "n"), row.names = c(NA, 14L), class = "data.frame")
> df
group t failure n
1 A 0 0 40000
2 A 1 0 30000
3 A 2 0 20000
4 A 3 0 10000
5 A 1 1 5
6 A 2 1 4
7 A 3 1 3
8 B 0 0 20000
9 B 1 0 15000
10 B 2 0 14000
11 B 3 0 11000
12 B 1 1 10
13 B 2 1 6
14 B 3 1 4

我知道我可以 rep df 按 n 列,所以每一行是一个单位:
(引用 How do I create a survival object in R?)
> library(survival)
> df2 <- df[rep(rownames(df),df$n),]
> sfit <- survfit(Surv(t,failure)~group, data = df2)

但是,我的实际数据大约有 1000 万个单位。有没有办法使用计数/频率变量进行生存以避免创建 1000 万行数据框?

最佳答案

您需要使用 weights范围。您可以比较这两种方法以确认您有相同的输出。

使用您重复的数据:

sfit <- survfit(Surv(t,failure)~group, data = df2)
summary(sfit)
Call: survfit(formula = Surv(t, failure) ~ group, data = df2)

group=A
time n.risk n.event survival std.err lower 95% CI upper 95% CI
1 60012 5 1.000 3.73e-05 1.000 1
2 30007 4 1.000 7.63e-05 1.000 1
3 10003 3 0.999 1.89e-04 0.999 1

group=B
time n.risk n.event survival std.err lower 95% CI upper 95% CI
1 40020 10 1.000 0.000079 1.000 1
2 25010 6 1.000 0.000126 0.999 1
3 11004 4 0.999 0.000221 0.999 1

正在使用 weights :
weights <- df$n
sfit2 <- survfit(Surv(t,failure)~group, data = df, weights = weights)
summary(sfit2)
Call: survfit(formula = Surv(t, failure) ~ group, data = df, weights = weights)

group=A
time n.risk n.event survival std.err lower 95% CI upper 95% CI
1 60012 5 1.000 3.73e-05 1.000 1
2 30007 4 1.000 7.63e-05 1.000 1
3 10003 3 0.999 1.89e-04 0.999 1

group=B
time n.risk n.event survival std.err lower 95% CI upper 95% CI
1 40020 10 1.000 0.000079 1.000 1
2 25010 6 1.000 0.000126 0.999 1
3 11004 4 0.999 0.000221 0.999 1

关于r - 分箱数据的生存,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37711599/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com