gpt4 book ai didi

r - 有没有办法绘制障碍模型结果 pscl 包或绘制计数 - r 中障碍模型的零截断 negbin 部分?

转载 作者:行者123 更新时间:2023-12-05 04:20:31 27 4
gpt4 key购买 nike

有没有办法在 R 中绘制障碍模型结果?我能够绘制障碍模型的零部分(带有 logit 链接的二项式)(如下),但我无法弄清楚如何绘制模型的计数部分(带有对数链接的截断负二项式)。我正在为跨栏模型使用 pscl 包。

示例数据(df 名称 = 数据):

structure(list(Byths = c(333L, 107L, 0L, 0L, 684L, 0L, 113L, 
0L, 0L, 20L, 251L, 20L, 0L, 0L, 32L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 182L, 30L, 0L, 0L, 183L, 0L, 0L, 0L, 8L, 213L, 108L,
310L, 960L, 720L, 0L, 0L, 6L, 72L, 78L, 15L, 196L, 256L, 608L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 105L, 0L, 194L, 0L, 440L, 0L, 0L,
0L, 0L, 0L, 18L, 0L, 239L, 262L, 0L, 102L, 17L, 0L, 0L, 0L, 0L,
0L, 68L, 93L, 0L, 226L, 118L, 91L, 330L, 104L, 68L, 224L, 0L,
0L, 18L, 79L, 71L, 8L, 73L, 38L, 39L, 7L), Season = structure(c(3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), levels = c("Pre",
"Peak", "Post"), class = "factor"), depthm = c(20.1170446232626,
9.1441111923921, 18.2882223847842, 18.2882223847842, 17.373811265545,
13.7161667885881, 13.7161667885881, 13.106559375762, 13.106559375762,
8.53450377956596, 8.53450377956596, 12.4969519629359, 20.1170446232626,
20.1170446232626, 8.83930748597903, 19.2026335040234, 19.2026335040234,
20.1170446232626, 20.1170446232626, 20.1170446232626, 20.1170446232626,
20.1170446232626, 14.0209704950012, 14.0209704950012, 14.0209704950012,
14.0209704950012, 14.0209704950012, 8.53450377956596, 12.8017556693489,
20.7266520360888, 20.7266520360888, 17.373811265545, 14.0209704950012,
14.0209704950012, 14.0209704950012, 8.22970007315289, 8.22970007315289,
18.5930260911973, 17.373811265545, 8.53450377956596, 7.92489636673982,
7.92489636673982, 7.92489636673982, 7.92489636673982, 7.92489636673982,
7.92489636673982, 9.44891489880517, 9.44891489880517, 9.44891489880517,
9.44891489880517, 15.8497927334796, 12.1921482565228, 8.83930748597903,
8.83930748597903, 18.8978297976103, 8.83930748597903, 8.83930748597903,
17.0690075591319, 20.4218483296757, 20.4218483296757, 20.1170446232626,
20.1170446232626, 13.106559375762, 13.106559375762, 13.106559375762,
13.106559375762, 13.106559375762, 13.106559375762, 8.83930748597903,
10.6681297244574, 9.75371860521824, 9.75371860521824, 9.75371860521824,
11.5825408436967, 9.1441111923921, 17.0690075591319, 17.0690075591319,
17.0690075591319, 17.0690075591319, 17.0690075591319, 17.0690075591319,
20.1170446232626, 10.0585223116313, 17.373811265545, 20.1170446232626,
20.4218483296757, 20.4218483296757, 7.92489636673982, 7.92489636673982,
7.92489636673982, 7.92489636673982, 7.92489636673982, 7.92489636673982,
7.92489636673982, 7.92489636673982)), row.names = c(5L, 7L, 12L,
13L, 14L, 16L, 18L, 21L, 22L, 23L, 24L, 26L, 28L, 29L, 32L, 33L,
34L, 35L, 36L, 37L, 38L, 39L, 46L, 48L, 49L, 50L, 51L, 52L, 53L,
54L, 55L, 56L, 57L, 59L, 61L, 62L, 63L, 65L, 66L, 71L, 73L, 75L,
77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 86L, 92L, 95L, 98L,
107L, 108L, 109L, 110L, 111L, 112L, 113L, 114L, 115L, 116L, 117L,
118L, 119L, 125L, 137L, 139L, 142L, 143L, 144L, 147L, 150L, 151L,
152L, 154L, 156L, 157L, 159L, 160L, 161L, 162L, 163L, 165L, 177L,
178L, 179L, 180L, 181L, 183L, 184L, 185L), class = "data.frame")

型号:

mod <- hurdle(Byths ~ depthm * Season, data = data, dist = "negbin")

summary(mod)

给予:

Call:
hurdle(formula = Byths ~ depthm * Season, data = data, dist = "negbin")

Pearson residuals:
Min 1Q Median 3Q Max
-0.8326 -0.5676 -0.2256 0.1534 4.7222

Count model coefficients (truncated negbin with log link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) 5.88538 0.63004 9.341 < 2e-16 ***
depthm -0.05014 0.05577 -0.899 0.368658
SeasonPeak -3.03180 0.86315 -3.512 0.000444 ***
SeasonPost -2.74270 1.95696 -1.402 0.161061
depthm:SeasonPeak 0.17058 0.07106 2.400 0.016380 *
depthm:SeasonPost 0.21029 0.13113 1.604 0.108783
Log(theta) 0.13512 0.19772 0.683 0.494367
Zero hurdle model coefficients (binomial with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.02258 1.10922 3.627 0.000287 ***
depthm -0.32024 0.08317 -3.851 0.000118 ***
SeasonPeak -2.55477 1.65557 -1.543 0.122798
SeasonPost -3.94742 3.35163 -1.178 0.238892
depthm:SeasonPeak 0.27367 0.12088 2.264 0.023569 *
depthm:SeasonPost 0.30061 0.21990 1.367 0.171617
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Theta: count = 1.1447
Number of iterations in BFGS optimization: 15
Log-likelihood: -344.6 on 13 Df

我对系数取幂来解释它们,但是我想为我的论文做一些数字。

我弄清楚如何绘制障碍模型的零部分的唯一方法是执行以下操作:

#turned counts into presence/absence

structure(list(Byths = c(1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1,
1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1,
0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0,
1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1),
Season = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), levels = c("Pre",
"Peak", "Post"), class = "factor"), depthm = c(20.1170446232626,
9.1441111923921, 18.2882223847842, 18.2882223847842, 17.373811265545,
13.7161667885881, 13.7161667885881, 13.106559375762, 13.106559375762,
8.53450377956596, 8.53450377956596, 12.4969519629359, 20.1170446232626,
20.1170446232626, 8.83930748597903, 19.2026335040234, 19.2026335040234,
20.1170446232626, 20.1170446232626, 20.1170446232626, 20.1170446232626,
20.1170446232626, 14.0209704950012, 14.0209704950012, 14.0209704950012,
14.0209704950012, 14.0209704950012, 8.53450377956596, 12.8017556693489,
20.7266520360888, 20.7266520360888, 17.373811265545, 14.0209704950012,
14.0209704950012, 14.0209704950012, 8.22970007315289, 8.22970007315289,
18.5930260911973, 17.373811265545, 8.53450377956596, 7.92489636673982,
7.92489636673982, 7.92489636673982, 7.92489636673982, 7.92489636673982,
7.92489636673982, 9.44891489880517, 9.44891489880517, 9.44891489880517,
9.44891489880517, 15.8497927334796, 12.1921482565228, 8.83930748597903,
8.83930748597903, 18.8978297976103, 8.83930748597903, 8.83930748597903,
17.0690075591319, 20.4218483296757, 20.4218483296757, 20.1170446232626,
20.1170446232626, 13.106559375762, 13.106559375762, 13.106559375762,
13.106559375762, 13.106559375762, 13.106559375762, 8.83930748597903,
10.6681297244574, 9.75371860521824, 9.75371860521824, 9.75371860521824,
11.5825408436967, 9.1441111923921, 17.0690075591319, 17.0690075591319,
17.0690075591319, 17.0690075591319, 17.0690075591319, 17.0690075591319,
20.1170446232626, 10.0585223116313, 17.373811265545, 20.1170446232626,
20.4218483296757, 20.4218483296757, 7.92489636673982, 7.92489636673982,
7.92489636673982, 7.92489636673982, 7.92489636673982, 7.92489636673982,
7.92489636673982, 7.92489636673982)), row.names = c(5L, 7L,
12L, 13L, 14L, 16L, 18L, 21L, 22L, 23L, 24L, 26L, 28L, 29L, 32L,
33L, 34L, 35L, 36L, 37L, 38L, 39L, 46L, 48L, 49L, 50L, 51L, 52L,
53L, 54L, 55L, 56L, 57L, 59L, 61L, 62L, 63L, 65L, 66L, 71L, 73L,
75L, 77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 86L, 92L, 95L,
98L, 107L, 108L, 109L, 110L, 111L, 112L, 113L, 114L, 115L, 116L,
117L, 118L, 119L, 125L, 137L, 139L, 142L, 143L, 144L, 147L, 150L,
151L, 152L, 154L, 156L, 157L, 159L, 160L, 161L, 162L, 163L, 165L,
177L, 178L, 179L, 180L, 181L, 183L, 184L, 185L), class = "data.frame")

然后绘制模型的障碍部分...

ggplot(dataPA, aes(x = depthm, y = Byths)) + 
geom_point() +
stat_smooth(method = "glm", method.args = list(family="binomial"), se = TRUE) + facet_wrap(~ Season)

binomialplot

有没有办法绘制障碍模型(截断的 negbin)的计数模型部分?还是总体上绘制障碍模型结果的更好方法?

我应该为此使用 predict.hurdle 并使用预测值吗?如果是这样,我该怎么做,我发现的示例不够清晰,我无法理解,而且我已经尝试了几周但没有奏效。

谢谢!

最佳答案

绘制模型的计数部分

您可以轻松地完成与障碍部分相同的操作,但现在使用模型的计数部分。您只需要使用负二项式模型,但只需要使用 Byths > 0 的数据。另一个问题是 glm() 没有负二项式模型的功能,因此您需要使用 MASS::glm.nb()

给你:

ggplot(data[data$Byths != 0, ], aes(x = depthm, y = Byths)) + 
geom_point() +
stat_smooth(method = MASS::glm.nb, se = TRUE) + facet_wrap(~ Season)

enter image description here

现在我想确保这会得到与使用函数作者的 predict 功能相同的结果。好消息是我们这样做了,但在上面我们很容易得到我们预测的置信区间。

newdat <- expand.grid(
depthm = seq(8, 20, .25),
Season = c("Pre", "Peak", "Post")
)
newdat$count_mean <- predict(mod, newdat, type = "count")
ggplot(newdat, aes(x = depthm, y = count_mean)) +
geom_point() +
facet_wrap(~ Season) +
ylim(c(0, 1250))

enter image description here

绘制整体模型

如果这是我的研究,我可能希望整体查看我的模型结果。所以在这里我绘制了每个值的预期计数——结合模型的计数和障碍部分,我们的整体预测。我还想看看整个预测分布,而不仅仅是均值。所以在这里,我绘制了整个分布(在选定值处),以便模型预测更加清晰。为此,我使用了 {ggdist} 包(我试图直接在那里输入模型预测,但不知道该怎么做,所以我从预测分布中采样了 10,000 次).

给你:

library(ggdist)

mean_dat <- expand_grid(
depthm = seq(8, 20, 0.25),
Season = c("Pre", "Peak", "Post")
) %>%
mutate(
mean_count = predict(mod, ., type = "response")
)

simdat <- expand_grid(
depthm = c(8, 12, 16, 20),
Season = c("Pre", "Peak", "Post")
) %>%
mutate(
counts = predict(mod, ., type = "prob", at = 0:1000)
) %>%
nest(counts) %>%
mutate(
sim_count = map(data, ~ sample(0:1000, size = 10000, replace = TRUE, prob = t(.x))),
.keep = "unused"
) %>%
unnest()


ggplot(mean_dat, aes(x = depthm, y = mean_count)) +
geom_line() +
geom_line() +
stat_slab(aes(y = sim_count), fill = "skyblue", alpha = .5, data = simdat) +
facet_wrap(~ Season)

enter image description here

黑线显示预测变量值的预期计数,蓝色分布显示某些选定值的预测分布(样本来自)。

现在,在我看来,这种可视化在显示模型的实际预测方面做得更好。大多数值仍然集中在 0 附近,但存在很大差异,尤其是当 depthm 变大时。

关于r - 有没有办法绘制障碍模型结果 pscl 包或绘制计数 - r 中障碍模型的零截断 negbin 部分?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74440752/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com