r - 从 R 中的区间 [start, stop] 数据估计密度-6ren

r - 从 R 中的区间 [start, stop] 数据估计密度

转载作者：行者123 更新时间：2023-12-05 07:01:49

25

4

描述

这个问题的动机来自临床/流行病学研究，其中研究通常招募患者，然后对他们进行不同时间的跟踪。

研究开始时的年龄分布通常很有趣并且很容易评估，但偶尔也会对研究期间任何时间的年龄分布感兴趣。

我的问题是，是否有一种方法可以根据区间数据(例如 [age_start, age_stop])估计这种密度，而无需如下扩展数据？长格式的方法看起来不够优雅，更不用说它的内存占用了!

使用生存包中的数据的可重现示例

#### Prep Data ###
library(survival)
library(ggplot2)
library(dplyr)

data(colon, package = 'survival')
# example using the colon dataset from the survival package
ccdeath <- colon %>%
  # use data on time to death (not recurrence)
  filter(etype == 2) %>%
  # age at end of follow-up (death or censoring)
  mutate(age_last = age + (time / 365.25))

#### Distribution Using Single Value ####
# age at study entry
ggplot(ccdeath, aes(x = age)) +
  geom_density() +
  labs(title = "Fig 1.",
       x = "Age at Entry (years)",
       y = "Density")

#### Using Person-Month Level Data ####
# create counting-process/person-time dataset
ccdeath_cp <- survSplit(Surv(age, age_last, status) ~ ., 
                        data = ccdeath,
                        cut = seq(from = floor(min(ccdeath$age)),
                                  to = ceiling(max(ccdeath$age_last)),
                                  by = 1/12))

nrow(ccdeath_cp) # over 50,000 rows

# distribution of age at person-month level
ggplot(ccdeath_cp, aes(x = age)) +
  geom_density() +
  labs(title = "Figure 2: Density based on approximate person-months",
       x = "Age (years)",
       y = "Density")

#### Using Person-Day Level Data ####
# create counting-process/person-time dataset
ccdeath_cp <- survSplit(Surv(age, age_last, status) ~ ., 
                        data = ccdeath,
                        cut = seq(from = floor(min(ccdeath$age)),
                                  to = ceiling(max(ccdeath$age_last)),
                                  by = 1/365.25))

nrow(ccdeath_cp) # over 1.5 million rows!

# distribution of age at person-month level
ggplot(ccdeath_cp, aes(x = age)) +
  geom_density() +
  labs(title = "Figure 3: Density based on person-days",
       x = "Age (years)",
       y = "Density")

注意:虽然我将这个问题标记为“生存”是因为我认为它会吸引熟悉该领域的人，但我对这里的事件发生时间不感兴趣，我只对所有研究时间的总体年龄分布感兴趣。

最佳答案

与其计算越来越精细的时间间隔，您可以只对特定年龄的患者数量进行累积计数

setDT(ccdeath)
x <- rbind(
  ccdeath[, .(age = age, num_patients = 1)],
  ccdeath[, .(age = age_last, num_patients = -1)]
)[, .(num_patients = sum(num_patients)), keyby = age]

cccdeath <- x[x[, .(age = unique(age))], on = 'age']
cccdeath[, num_patients := cumsum(num_patients)]
ggplot(cccdeath, aes(x = age, y = num_patients)) + geom_step()

锯齿模式是因为假定每个患者的起始年龄都是整数。对如何平滑它有一些想法并提出了这个想法 - 将相等的概率分配给给定 age 和 age+1 之间的一组均匀间隔的年龄。你得到这样的东西，

smooth_param <- 12
x <- rbindlist(lapply(
  (1:smooth_param-0.5)/smooth_param,
  function(s) {
    rbind(
      ccdeath[, .(age = age+s, num_patients = 1/smooth_param)],
      ccdeath[, .(age = age_last+s, num_patients = -1/smooth_param)]
    )
  }
))[, .(num_patients = sum(num_patients)), keyby = age]

cccdeath <- x[x[, .(age = sort(unique(age)))], on = 'age']
cccdeath[, num_patients := cumsum(num_patients)]
ggplot(cccdeath, aes(x = age, y = num_patients)) + geom_step()

关于r - 从 R 中的区间 [start, stop] 数据估计密度，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/63746044/

25

4

0

文章推荐： c# - 我的简单正则表达式是灾难性的回溯吗？

文章推荐： javascript - 是否可以使用 javascript 将整个网页静音？

文章推荐： python - 将 ASN.1 字符串与 python 正则表达式匹配

encoding - 哪种二维条码具有最高的数据容量/密度
;) 如果您想将 2mb 数据编码到 2d 条码中，哪种 2 条码适合作为起点或推荐。今天有很多不同类型的二维条码，Aztec 二维条码、maxicodes、Pdf417、Microsoft HCC
r - 绘制 3d 密度
我想创建一个具有密度的 3d 图。我使用函数 density 首先为特定的 x 值创建一个二维图，然后该函数创建密度并将它们放入 y 变量中。现在我有第二组 x 值并将其再次放入密度函数中，然后我得
r - 如何解释不同的 ggplot2 密度？
我对 geom_density 的以下变体的含义感到困惑在ggplot中: 有人可以解释这四个电话之间的区别: geom_density(aes_string(x=myvar)) geom_densi
statistics - 如何从一组加权样本中估计高斯(混合)密度？
已结束。此问题正在寻求书籍、工具、软件库等的推荐。它不满足Stack Overflow guidelines 。目前不接受答案。我们不允许提出寻求书籍、工具、软件库等推荐的问题。您可以编辑问题，以便
c++ - 2个给定数字之间的 double 密度
重要编辑:最初的问题是关于获取 double 和分数的密度。当我得到 double 而不是分数的答案时，我正在改变主题以结束这个问题。原问题的另一半是here 新问题我想找出 2 个给定数字之间的
android - 抽象 LCD 密度
如何计算 AVD 的抽象 LCD 密度？最佳答案抽象 LCD 密度以每英寸点数为单位(参见 docs)。 wikipedia article on Pixel density有一个有用的部分解释了
image - 在不增加文件大小的情况下设置 JPG 密度 (dpi)
我使用(在 Windows 下)以下命令 magick convert -units pixelsperinch file_in -density 600 file_out 设置 JPG 图像的 dp
android计算pad或手机的分辨率/像素/密度/屏幕尺寸/DPI值的方法
手机分辨率基础知识（dpi,dip计算） 1.术语和概念术语说明备注 screen size（屏幕尺寸)
r - 使用 2 个以上的组创建 Highcharts 密度
我尝试创建具有两个以上组的 Highcharts 密度。我找到了一种手动添加它们的方法，但必须有更好的方法来处理组。示例:我想创建一个类似于下面的 ggplot 图表的 highchart，而不是将
imagemagick - 从 pdf 转换时的默认 imagemagick 密度
我们有以下代码 convert foo.pdf foo.tiff 这多年来一直运行良好，并且由此产生的 tiff 是一个合理的打印质量。我们刚刚升级了 imagemagick，现在 tiff 的分辨
r - ggplot 中特殊变量的文档(..count..、..密度.. 等)
ggplot2 中的 stats_ 函数创建特殊变量，例如stat_bin2d 创建一个名为 ..count.. 的特殊变量。在哪里可以找到列出哪个 stat_ 函数返回哪些特殊变量的文档？我查看了
r - ggplot 中的特殊变量(..count..、..密度..等)
考虑以下几行。 p <- ggplot(mpg, aes(x=factor(cyl), y=..count..)) p + geom_histogram() p + stat_summary(fu
android - Galaxy Mini 模拟器 LCD 密度
我想模拟 Samsung Galaxy Mini。我将分辨率设置为 240x320，将 LCD 密度设置为 180。这是否正确？最佳答案是的，绝对正确.... 关于android - Galaxy
Android获取常用辅助方法(获取屏幕高度、宽度、密度、通知栏高度、截图)
我们需要获取Android手机或Pad的屏幕的物理尺寸，以便于界面的设计或是其他功能的实现。下面就分享一下Android中常用的一些辅助方法：获取屏幕高度：
r - 参数化的 ggplot2 直方图/密度 aes 函数找不到对象
我创建了一个直方图/密度图函数，我希望 y 轴是计数而不是密度，但在参数化其 binwidth 时遇到问题。我正在使用基于 http://docs.ggplot2.org/current/geom_
android - 下载应用程序时，Google Play 是否包含所有 mipmap 密度？
我试过四处搜索，但没有任何运气。我开发了一些使用大量图像的应用程序(大小大多为 200*200 像素)。我想通过添加不同尺寸的图像来支持不同的屏幕尺寸，但由于这会增加 apk 的许多兆字节，我需要知道
python - 如何更改 Pandas 时间序列图的 x-ticks 密度？
我正在尝试生成一个较小的图形来可视化 Pandas 时间序列。然而，自动生成的 x-ticks 不适应新的大小并导致重叠的刻度。我想知道如何调整 x-ticks 的频率？例如。对于这个例子: figs
r - 在 R 中使用 facet_wrap 规范化 ggplot2 密度
我正在使用 geom_density 制作一系列密度图从数据框中，并使用 facet_wrap 按条件显示它，如: ggplot(iris) + geom_density(aes(x=Sepal.Wi
android - 密度 APK 拆分不会反射(reflect)在 mipmap 文件夹中
我已经从 From this example 了解了 APK 拆分概念我已经尝试在我的项目中实现它，但只有 Drawable 文件夹受到影响。我也想拆分 Mipmap 文件夹。下面是我的 buil
javascript - 如何在 Javascript 中设置 JPEG/PNG 图像的分辨率/密度？
我需要在 javascript 中更改 JPG/PNG 类型图像的分辨率/密度。我需要这样做的原因是我可以将图像发送到第三方 API，然后第三方 API 将根据分辨率/密度元数据知道要打印的每英寸像素

首页

博学

6Ren·AI

商城

r - 从 R 中的区间 [start, stop] 数据估计密度

描述

使用生存包中的数据的可重现示例