- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我有一个数据集,它被建模为具有混合效应的零膨胀负二项式。我想从模型预测中获取置信区间并绘制模型均值和置信区间。我试图绘制模型均值。有人可以让我知道这是否是正确的方法吗?我不知道如何将模型的置信区间绘制到 ggplot2 上。我想绘制数据的预测平均值及其置信区间。我在下面的代码中对图表进行了基本尝试。
library(pscl)
library(lmtest)
df <- data.frame(
fertilizer = c("N","N","N","N","N","N","N","N","N","N","N","N","P","P","P","P","P","P","P","P","P","P","P","P","N","N","N","N","N","N","N","N","N","N","N","N","P","P","P","P","P","P","P","P","P","P","P","P"),
level = c("low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","low"),
repro = c(0,90,2,4,0,80,1,90,2,33,56,0,99,100,66,80,1,0,2,33,0,0,1,2,90,5,2,2,5,8,0,1,90,2,4,66,0,0,0,0,1,2,90,5,2,5,8,55)
)
model <- formula(repro ∼ fertilizer + level | fertilizer * level)
modelzinb <- zeroinfl(model, dist = "negbin", link = "logit",data =df)
summary(modelzinb)
df$predict <- predict(modelzinb)
ggplot(df, aes(x=fertilizer, y=predict, color = fertilizer)) + theme_bw() + stat_summary(aes(color = fertilizer),fun.y = mean, geom = "point", size = 4, position = position_dodge(0.1)) +
scale_x_discrete("Fertlizer") +
facet_wrap(.~level)
最佳答案
我不确定您想绘制什么。但我可以向您展示如何计算零膨胀负二项式模型的预测置信区间。
请注意,我无法评论拟合质量,也无法评论将此类模型拟合到您的数据是否有意义。要回答这类问题需要我不具备的特定领域知识。
让我们从将模型拟合到您的数据开始。
library(pscl)
model <- formula(repro ~ fertilizer + level | fertilizer * level)
modelzinb <- zeroinfl(model, dist = "negbin", link = "logit", data = df)
我们可以使用 predict
获得响应的模型估计值。
resp <- predict(modelzinb)
请注意,在您的零膨胀 NB 模型中,predict.zeroinfl
(默认情况下)返回观察计数范围内的估计平均响应 repro
.
关于置信区间 (CI),这里的“问题”是 precict.zeroinfl
没有 interval
允许直接计算 CI 的参数。旁注:似乎pscl::zeroinfl
用于包含该功能,请参阅 documentation for version 0.54 .也许值得就此联系包维护者。
底线是,我们必须找到一种不同的方法来计算预测置信区间。为此,我们使用 Bootstrap 。 R 库 boot
提供了所有必要的功能和工具来做到这一点。
首先,我们可以使用函数 boot::boot
生成预测响应的引导复制。 boot::boot
需要一个基于数据生成感兴趣数量(此处为预测响应)的函数。所以我们先定义这样一个函数stat
.在这种特殊情况下 stat
必须有两个参数:第一个参数是原始数据,第二个参数是行索引向量(定义数据的随机引导样本)。然后,该函数将使用自举数据来拟合模型,然后使用该模型根据完整数据预测平均响应。
stat <- function(df, inds) {
model <- formula(repro ~ fertilizer + level | fertilizer * level)
predict(
zeroinfl(model, dist = "negbin", link = "logit", data = df[inds, ]),
newdata = df)
}
为了确保结果的可重复性,我们设置了一个固定的随机种子,并为估计的平均响应生成了 100 个 bootstrap 样本
set.seed(2018)
library(boot)
res <- boot(df, stat, R = 100)
输出对象res
是 list
包含元素 t0
中完整数据的估计平均响应(用 all.equal(res$t0, predict(modelzinb))
验证)和元素 t
中自举样本的估计平均响应(这是一个维度为 R x nrow(df)
的矩阵,其中 R
是自举样本的数量)。
剩下要做的就是根据与 bootstrap 样本拟合的模型估计的平均响应计算置信区间。为此,我们可以使用方便的函数 boot::boot.ci
.该函数允许计算不同类型的 CI(基本、正常、BCa 等)。这里我们使用"basic"
用于演示目的。我并不是说这是最好的选择。
boot::boot.ci
需要 index
参数对应于估计平均响应向量的条目。实际间隔存储在矩阵的最后 2 列(第 4 列和第 5 列)中,存储为 list
。 boot.ci
的元素返回对象。
CI <- setNames(as.data.frame(t(sapply(1:nrow(df), function(row)
boot.ci(res, conf = 0.95, type = "basic", index = row)$basic[, 4:5]))),
c("lower", "upper"))
现在我们已经完成了,我们可以将 CI 列绑定(bind)到原始数据,包括预测平均响应值。
df_all <- cbind.data.frame(df, response = predict(modelzinb), CI)
# fertilizer level repro response lower upper
#1 N low 0 29.72057 5.876731 46.48165
#2 N low 90 29.72057 5.876731 46.48165
#3 N high 2 21.99345 -15.228956 38.86421
#4 N high 4 21.99345 -15.228956 38.86421
#5 N low 0 29.72057 5.876731 46.48165
#6 N low 80 29.72057 5.876731 46.48165
#7 N high 1 21.99345 -15.228956 38.86421
#8 N high 90 21.99345 -15.228956 38.86421
#9 N low 2 29.72057 5.876731 46.48165
#10 N low 33 29.72057 5.876731 46.48165
#11 N high 56 21.99345 -15.228956 38.86421
#12 N high 0 21.99345 -15.228956 38.86421
#13 P low 99 24.22626 -9.225827 46.17656
#14 P low 100 24.22626 -9.225827 46.17656
#15 P high 66 22.71826 2.595246 39.88333
#16 P high 80 22.71826 2.595246 39.88333
#17 P low 1 24.22626 -9.225827 46.17656
#18 P low 0 24.22626 -9.225827 46.17656
#19 P high 2 22.71826 2.595246 39.88333
#20 P high 33 22.71826 2.595246 39.88333
#21 P low 0 24.22626 -9.225827 46.17656
#22 P low 0 24.22626 -9.225827 46.17656
#23 P high 1 22.71826 2.595246 39.88333
#24 P high 2 22.71826 2.595246 39.88333
#25 N low 90 29.72057 5.876731 46.48165
#26 N low 5 29.72057 5.876731 46.48165
#27 N high 2 21.99345 -15.228956 38.86421
#28 N high 2 21.99345 -15.228956 38.86421
#29 N low 5 29.72057 5.876731 46.48165
#30 N low 8 29.72057 5.876731 46.48165
#31 N high 0 21.99345 -15.228956 38.86421
#32 N high 1 21.99345 -15.228956 38.86421
#33 N low 90 29.72057 5.876731 46.48165
#34 N low 2 29.72057 5.876731 46.48165
#35 N high 4 21.99345 -15.228956 38.86421
#36 N high 66 21.99345 -15.228956 38.86421
#37 P low 0 24.22626 -9.225827 46.17656
#38 P low 0 24.22626 -9.225827 46.17656
#39 P high 0 22.71826 2.595246 39.88333
#40 P high 0 22.71826 2.595246 39.88333
#41 P low 1 24.22626 -9.225827 46.17656
#42 P low 2 24.22626 -9.225827 46.17656
#43 P high 90 22.71826 2.595246 39.88333
#44 P high 5 22.71826 2.595246 39.88333
#45 P low 2 24.22626 -9.225827 46.17656
#46 P low 5 24.22626 -9.225827 46.17656
#47 P high 8 22.71826 2.595246 39.88333
#48 P low 55 24.22626 -9.225827 46.17656
#...
df <- data.frame(
fertilizer = c("N","N","N","N","N","N","N","N","N","N","N","N","P","P","P","P","P","P","P","P","P","P","P","P","N","N","N","N","N","N","N","N","N","N","N","N","P","P","P","P","P","P","P","P","P","P","P","P"),
level = c("low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","low"),
repro = c(0,90,2,4,0,80,1,90,2,33,56,0,99,100,66,80,1,0,2,33,0,0,1,2,90,5,2,2,5,8,0,1,90,2,4,66,0,0,0,0,1,2,90,5,2,5,8,55))
关于r - 如何在 ggplot2 R 中为零膨胀负二项式函数添加置信区间?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58002334/
我是一名优秀的程序员,十分优秀!