gpt4 book ai didi

r - 如何在 ggplot2 R 中为零膨胀负二项式函数添加置信区间?

转载 作者:行者123 更新时间:2023-12-04 10:49:40 25 4
gpt4 key购买 nike

我有一个数据集,它被建模为具有混合效应的零膨胀负二项式。我想从模型预测中获取置信区间并绘制模型均值和置信区间。我试图绘制模型均值。有人可以让我知道这是否是正确的方法吗?我不知道如何将模型的置信区间绘制到 ggplot2 上。我想绘制数据的预测平均值及其置信区间。我在下面的代码中对图表进行了基本尝试。

library(pscl)
library(lmtest)

df <- data.frame(
fertilizer = c("N","N","N","N","N","N","N","N","N","N","N","N","P","P","P","P","P","P","P","P","P","P","P","P","N","N","N","N","N","N","N","N","N","N","N","N","P","P","P","P","P","P","P","P","P","P","P","P"),
level = c("low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","low"),
repro = c(0,90,2,4,0,80,1,90,2,33,56,0,99,100,66,80,1,0,2,33,0,0,1,2,90,5,2,2,5,8,0,1,90,2,4,66,0,0,0,0,1,2,90,5,2,5,8,55)
)

model <- formula(repro ∼ fertilizer + level | fertilizer * level)
modelzinb <- zeroinfl(model, dist = "negbin", link = "logit",data =df)
summary(modelzinb)

df$predict <- predict(modelzinb)

ggplot(df, aes(x=fertilizer, y=predict, color = fertilizer)) + theme_bw() + stat_summary(aes(color = fertilizer),fun.y = mean, geom = "point", size = 4, position = position_dodge(0.1)) +
scale_x_discrete("Fertlizer") +
facet_wrap(.~level)

最佳答案

我不确定您想绘制什么。但我可以向您展示如何计算零膨胀负二项式模型的预测置信区间。

请注意,我无法评论拟合质量,也无法评论将此类模型拟合到您的数据是否有意义。要回答这类问题需要我不具备的特定领域知识。

  1. 让我们从将模型拟合到您的数据开始。

    library(pscl)
    model <- formula(repro ~ fertilizer + level | fertilizer * level)
    modelzinb <- zeroinfl(model, dist = "negbin", link = "logit", data = df)
  2. 我们可以使用 predict获得响应的模型估计值。

    resp <- predict(modelzinb)

    请注意,在您的零膨胀 NB 模型中,predict.zeroinfl (默认情况下)返回观察计数范围内的估计平均响应 repro .

    关于置信区间 (CI),这里的“问题”是 precict.zeroinfl没有 interval允许直接计算 CI 的参数。旁注:似乎pscl::zeroinfl 用于包含该功能,请参阅 documentation for version 0.54 .也许值得就此联系包维护者。

    底线是,我们必须找到一种不同的方法来计算预测置信区间。为此,我们使用 Bootstrap 。 R 库 boot提供了所有必要的功能和工具来做到这一点。

  3. 首先,我们可以使用函数 boot::boot生成预测响应的引导复制。 boot::boot需要一个基于数据生成感兴趣数量(此处为预测响应)的函数。所以我们先定义这样一个函数stat .在这种特殊情况下 stat必须有两个参数:第一个参数是原始数据,第二个参数是行索引向量(定义数据的随机引导样本)。然后,该函数将使用自举数据来拟合模型,然后使用该模型根据完整数据预测平均响应。

    stat <- function(df, inds) {
    model <- formula(repro ~ fertilizer + level | fertilizer * level)
    predict(
    zeroinfl(model, dist = "negbin", link = "logit", data = df[inds, ]),
    newdata = df)
    }

    为了确保结果的可重复性,我们设置了一个固定的随机种子,并为估计的平均响应生成了 100 个 bootstrap 样本

    set.seed(2018)
    library(boot)
    res <- boot(df, stat, R = 100)

    输出对象reslist包含元素 t0 中完整数据的估计平均响应(用 all.equal(res$t0, predict(modelzinb)) 验证)和元素 t 中自举样本的估计平均响应(这是一个维度为 R x nrow(df) 的矩阵,其中 R 是自举样本的数量)。

  4. 剩下要做的就是根据与 bootstrap 样本拟合的模型估计的平均响应计算置信区间。为此,我们可以使用方便的函数 boot::boot.ci .该函数允许计算不同类型的 CI(基本、正常、BCa 等)。这里我们使用"basic"用于演示目的。我并不是说这是最好的选择。

    boot::boot.ci需要 index参数对应于估计平均响应向量的条目。实际间隔存储在矩阵的最后 2 列(第 4 列和第 5 列)中,存储为 listboot.ci 的元素返回对象。

    CI <- setNames(as.data.frame(t(sapply(1:nrow(df), function(row)
    boot.ci(res, conf = 0.95, type = "basic", index = row)$basic[, 4:5]))),
    c("lower", "upper"))
  5. 现在我们已经完成了,我们可以将 CI 列绑定(bind)到原始数​​据,包括预测平均响应值。

    df_all <- cbind.data.frame(df, response = predict(modelzinb), CI)
    # fertilizer level repro response lower upper
    #1 N low 0 29.72057 5.876731 46.48165
    #2 N low 90 29.72057 5.876731 46.48165
    #3 N high 2 21.99345 -15.228956 38.86421
    #4 N high 4 21.99345 -15.228956 38.86421
    #5 N low 0 29.72057 5.876731 46.48165
    #6 N low 80 29.72057 5.876731 46.48165
    #7 N high 1 21.99345 -15.228956 38.86421
    #8 N high 90 21.99345 -15.228956 38.86421
    #9 N low 2 29.72057 5.876731 46.48165
    #10 N low 33 29.72057 5.876731 46.48165
    #11 N high 56 21.99345 -15.228956 38.86421
    #12 N high 0 21.99345 -15.228956 38.86421
    #13 P low 99 24.22626 -9.225827 46.17656
    #14 P low 100 24.22626 -9.225827 46.17656
    #15 P high 66 22.71826 2.595246 39.88333
    #16 P high 80 22.71826 2.595246 39.88333
    #17 P low 1 24.22626 -9.225827 46.17656
    #18 P low 0 24.22626 -9.225827 46.17656
    #19 P high 2 22.71826 2.595246 39.88333
    #20 P high 33 22.71826 2.595246 39.88333
    #21 P low 0 24.22626 -9.225827 46.17656
    #22 P low 0 24.22626 -9.225827 46.17656
    #23 P high 1 22.71826 2.595246 39.88333
    #24 P high 2 22.71826 2.595246 39.88333
    #25 N low 90 29.72057 5.876731 46.48165
    #26 N low 5 29.72057 5.876731 46.48165
    #27 N high 2 21.99345 -15.228956 38.86421
    #28 N high 2 21.99345 -15.228956 38.86421
    #29 N low 5 29.72057 5.876731 46.48165
    #30 N low 8 29.72057 5.876731 46.48165
    #31 N high 0 21.99345 -15.228956 38.86421
    #32 N high 1 21.99345 -15.228956 38.86421
    #33 N low 90 29.72057 5.876731 46.48165
    #34 N low 2 29.72057 5.876731 46.48165
    #35 N high 4 21.99345 -15.228956 38.86421
    #36 N high 66 21.99345 -15.228956 38.86421
    #37 P low 0 24.22626 -9.225827 46.17656
    #38 P low 0 24.22626 -9.225827 46.17656
    #39 P high 0 22.71826 2.595246 39.88333
    #40 P high 0 22.71826 2.595246 39.88333
    #41 P low 1 24.22626 -9.225827 46.17656
    #42 P low 2 24.22626 -9.225827 46.17656
    #43 P high 90 22.71826 2.595246 39.88333
    #44 P high 5 22.71826 2.595246 39.88333
    #45 P low 2 24.22626 -9.225827 46.17656
    #46 P low 5 24.22626 -9.225827 46.17656
    #47 P high 8 22.71826 2.595246 39.88333
    #48 P low 55 24.22626 -9.225827 46.17656
    #...

示例数据

df <- data.frame(
fertilizer = c("N","N","N","N","N","N","N","N","N","N","N","N","P","P","P","P","P","P","P","P","P","P","P","P","N","N","N","N","N","N","N","N","N","N","N","N","P","P","P","P","P","P","P","P","P","P","P","P"),
level = c("low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","low"),
repro = c(0,90,2,4,0,80,1,90,2,33,56,0,99,100,66,80,1,0,2,33,0,0,1,2,90,5,2,2,5,8,0,1,90,2,4,66,0,0,0,0,1,2,90,5,2,5,8,55))

关于r - 如何在 ggplot2 R 中为零膨胀负二项式函数添加置信区间?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58002334/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com