r - 有哪些替代方法可以在公式中指定二项式成功/试验？-6ren

r - 有哪些替代方法可以在公式中指定二项式成功/试验？

转载作者：行者123 更新时间：2023-12-02 07:43:25

假设您正在对二项式数据进行建模，其中每个响应都是来自带有一些解释变量(a 和 b)的多次试验 (N) 的多次成功 (y)。有几个函数可以做这种事情，它们似乎都使用不同的方法来指定 y 和 N。

在glm中，您执行glm(cbind(y,N-y)~a+b, data = d)(LHS成功/失败矩阵)

在 inla 中，您执行 inla(y~a+b, NTrials=d$N, data=d) (分别指定试验次数)

在 glmmBUGS 中，您执行 glmmBUGS(y+N~a+b,data=d) (将成功 + 试验指定为 LHS 的术语)

在编写新方法时，我一直认为最好遵循 glm 的做法，因为这是人们通常首先遇到二项式响应数据的地方。但是，我永远记不起它是 cbind(y,N-y) 还是 cbind(y,N) - 而且我的数据中通常似乎有成功/试验次数而不是成功/失败的次数 - YMMV。

当然，其他方法也是可能的。例如，使用 RHS 上的函数来标记变量是试验次数还是失败次数:

myblm( y ~ a + b + Ntrials(N), data=d)
myblm( y ~ a + b + Nfails(M), data=d)  # if your dataset has succ/fail variables

或者定义一个运算符来执行 cbind，这样你就可以这样做:

myblm( y %of% N ~ a + b, data=d)

从而为 LHS 附加一些含义，使其变得明确。

有人有更好的想法吗？正确的做法是什么？

最佳答案

您还可以让y为分数，在这种情况下您需要提供权重。它不在formula 参数中，但击键次数几乎与在formula 中相同。这是一个例子

> set.seed(73574836)
> x <- runif(10)
> n <- sample.int(10, 2)
> y <- sapply(mapply(rbinom, size = 1, n, (1 + exp(1 - x))^-1), function(x) 
+   sum(x == 1))
> df <- data.frame(y = y, frac = y / n, x = x, weights = n)
> df
   y  frac      x weights
1  2 1.000 0.9051       2
2  5 0.625 0.3999       8
3  1 0.500 0.4649       2
4  4 0.500 0.5558       8
5  0 0.000 0.8932       2
6  3 0.375 0.1825       8
7  1 0.500 0.1879       2
8  4 0.500 0.5041       8
9  0 0.000 0.5070       2
10 3 0.375 0.3379       8
> 
> # the following two fits are identical
> summary(glm(cbind(y, weights - y) ~ x, binomial(), df))

Call:
glm(formula = cbind(y, weights - y) ~ x, family = binomial(), 
    data = df)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.731  -0.374   0.114   0.204   1.596  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   -0.416      0.722   -0.58     0.56
x              0.588      1.522    0.39     0.70

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 9.5135  on 9  degrees of freedom
Residual deviance: 9.3639  on 8  degrees of freedom
AIC: 28.93

Number of Fisher Scoring iterations: 3

> summary(glm(frac ~ x, binomial(), df, weights = weights))

Call:
glm(formula = frac ~ x, family = binomial(), data = df, weights = weights)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.731  -0.374   0.114   0.204   1.596  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   -0.416      0.722   -0.58     0.56
x              0.588      1.522    0.39     0.70

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 9.5135  on 9  degrees of freedom
Residual deviance: 9.3639  on 8  degrees of freedom
AIC: 28.93

Number of Fisher Scoring iterations: 3

上述工作的原因归结为 glm 对二项式结果的实际作用。无论您如何指定结果，它都会计算每个观察的分数以及与观察相关的权重。这是来自 ?glm 的片段，它给出了估计中发生的情况的提示

If a binomial glm model was specified by giving a two-column response, the weights returned by prior.weights are the total numbers of cases (factored by the supplied case weights) and the component y of the result is the proportion of successes.

或者，您可以使用 model.frame 为 glm.fit 或 glm 制作包装器。请参阅 ?model.frame

中的 ... 参数

... for model.frame methods, a mix of further arguments such as data, na.action, subset to pass to the default method. Any additional arguments (such as offset and weights or other named arguments) which reach the default method are used to create further columns in the model frame, with parenthesised names such as "(offset)".

我后来看到了 Ben Bolker 的评论。以上是他指出的。

关于r - 有哪些替代方法可以在公式中指定二项式成功/试验？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/19567905/

文章推荐： java - 如何使用 Java 流求除数之和？

文章推荐： javascript - jquery 结构 - 每个文件一个函数

文章推荐： Django REST框架: Get ID/URL during validation?

r - 二项式，二项式()和 'binomial'之间的区别
使用 glm 时，binomial、binomial() 和 'binomial' 之间有什么区别。它们并不相同，如以下代码所示: > library(MASS) > bwdf = birthwt[-
r - 二项式 glmer() 预测的分类精度
我一直在用我的(非 r-savvy)大脑来让 R 产生二项式 glmer 模型的正确预测的百分比。我知道这不是统计上的 super 信息，但经常被报道；所以我也想举报。数据: 因变量:Tipo，它有
python - 拟合 beta 二项式
我一直在寻找一种方法来使数据符合 beta 二项分布并估计 alpha 和 beta，类似于 VGAM 库中的 vglm 包的方式。我一直无法找到如何在 python 中执行此操作。有一个 scipy
julia - 如何从 Julia 中拟合的 GLM 模型对象中提取数据分布类型(例如泊松、二项式)？
如何在 Julia 中提取一般线性模型中指定的数据分布？例如，下面我安装了一个玩具示例 Poisson GLM。我想从模型中提取一个字符串“Poisson”。同样，如果使用数据分布指定模型 = Bin

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

r - 有哪些替代方法可以在公式中指定二项式成功/试验？

评论