r - 三明治+ mlogit : `Error in ef/X : non-conformable arrays` when using `vcovHC()` to compute robust/clustered standard errors-6ren

r - 三明治+ mlogit : `Error in ef/X : non-conformable arrays` when using `vcovHC()` to compute robust/clustered standard errors

转载作者：行者123 更新时间：2023-12-04 08:02:03

在离散选择问题中使用 mlogit() 拟合多项式 Logit (MNL) 后，我试图计算稳健/集群标准误差。不幸的是，我怀疑我遇到了问题，因为我使用的是 long 格式的数据(在我的情况下这是必须的)，并且在 #Error in ef/X : non-conformable arrays 之后收到错误 sandwich::vcovHC( , "HC0") 。

数据
为了说明，请仔细考虑以下数据。它表示来自 5 个个体 ( id_ind ) 的数据，这些个体在 3 个备选方案 ( altern ) 中进行选择。五个人各选择三遍；因此我们有 15 种选择情况( id_choice )。每个备选方案由两个通用属性( x1 和 x2 )表示，并且选项注册在 y 中(如果选择了 1，否则为 0)。

df <- read.table(header = TRUE, text = "
id_ind id_choice altern           x1          x2 y
1       1         1      1  1.586788801  0.11887832 1
2       1         1      2 -0.937965347  1.15742493 0
3       1         1      3 -0.511504401 -1.90667519 0
4       1         2      1  1.079365680 -0.37267925 0
5       1         2      2 -0.009203032  1.65150370 1
6       1         2      3  0.870474033 -0.82558651 0
7       1         3      1 -0.638604013 -0.09459502 0
8       1         3      2 -0.071679538  1.56879334 0
9       1         3      3  0.398263302  1.45735788 1
10      2         4      1  0.291413453 -0.09107974 0
11      2         4      2  1.632831160  0.92925495 0
12      2         4      3 -1.193272276  0.77092623 1
13      2         5      1  1.967624379 -0.16373709 1
14      2         5      2 -0.479859282 -0.67042130 0
15      2         5      3  1.109780885  0.60348187 0
16      2         6      1 -0.025834772 -0.44004183 0
17      2         6      2 -1.255129594  1.10928280 0
18      2         6      3  1.309493274  1.84247199 1
19      3         7      1  1.593558740 -0.08952151 0
20      3         7      2  1.778701074  1.44483791 1
21      3         7      3  0.643191170 -0.24761157 0
22      3         8      1  1.738820924 -0.96793288 0
23      3         8      2 -1.151429915 -0.08581901 0
24      3         8      3  0.606695064  1.06524268 1
25      3         9      1  0.673866953 -0.26136206 0
26      3         9      2  1.176959443  0.85005871 1
27      3         9      3 -1.568225496 -0.40002252 0
28      4        10      1  0.516456176 -1.02081089 1
29      4        10      2 -1.752854918 -1.71728381 0
30      4        10      3 -1.176101700 -1.60213536 0
31      4        11      1 -1.497779616 -1.66301234 0
32      4        11      2 -0.931117325  1.50128532 1
33      4        11      3 -0.455543630 -0.64370825 0
34      4        12      1  0.894843784 -0.69859139 0
35      4        12      2 -0.354902281  1.02834859 0
36      4        12      3  1.283785176 -1.18923098 1
37      5        13      1 -1.293772990 -0.73491317 0
38      5        13      2  0.748091387  0.07453705 1
39      5        13      3 -0.463585127  0.64802031 0
40      5        14      1 -1.946438667  1.35776140 0
41      5        14      2 -0.470448172 -0.61326604 1
42      5        14      3  1.478763383 -0.66490028 0
43      5        15      1  0.588240775  0.84448489 1
44      5        15      2  1.131731049 -1.51323232 0
45      5        15      3  0.212145247 -1.01804594 0
")

问题
因此，我们可以使用 mlogit() 拟合 MNL 并提取它们的稳健方差-协方差，如下所示:

library(mlogit)
library(sandwich)
mo <-  mlogit(formula = y ~ x1 + x2|0 , 
              method ="nr",  
              data =  df,
              idx  =  c("id_choice", "altern"))

sandwich::vcovHC(mo, "HC0")
#Error in ef/X : non-conformable arrays

正如我们所看到的， sandwich::vcovHC 产生了一个错误，它表明 ef/X 是不合规的。其中 X <- model.matrix(x) 和 ef <- estfun(x, ...) 。在查看 the source code on the mirror on GitHub 之后，我发现了一个问题，因为数据是长格式的， ef 的维度为 15 x 2 ， X 的维度为 45 x 2 。

我的解决方法
鉴于节目必须继续，我正在使用我从 sandwich 和 I adjusted to accommodate the Stata's output. 借用的一些函数手动计算稳健和集群标准错误

> 稳健的标准误差
这些行的灵感来自 sandwich::meat() 函数。

psi<- estfun(mo)
k <- NCOL(psi)
n <- NROW(psi)
rval <-  (n/(n-1))* crossprod(as.matrix(psi))
vcov(mo) %*% rval %*% vcov(mo)

#            x1         x2
# x1 0.23050261 0.09840356
# x2 0.09840356 0.12765662

等价于Stata

qui clogit y x1 x2 ,group(id_choice) r
mat li e(V)
symmetric e(V)[2,2]
            y:         y:
            x1         x2
y:x1  .23050262
y:x2  .09840356  .12765662

> 聚类标准误差
在这里，鉴于每个人回答 3 个问题，个体之间很可能存在某种程度的相关性；因此在这种情况下应该首选集群校正。下面我计算了这种情况下的集群校正，并展示了与 clogit , cluster() 的 Stata 输出的等效性。

id_ind_collapsed <- df$id_ind[!duplicated(mo$model$idx$id_choice,)]
psi_2 <- rowsum(psi, group = id_ind_collapsed )

k_cluster <- NCOL(psi_2)
n_cluster <- NROW(psi_2)
rval_cluster <-  (n_cluster/(n_cluster-1))* crossprod(as.matrix(psi_2))
vcov(mo) %*% rval_cluster %*% vcov(mo)

#           x1        x2
# x1 0.1766707 0.1007703
# x2 0.1007703 0.1180004

等价于Stata

qui clogit y x1 x2 ,group(id_choice) cluster(id_ind)
symmetric e(V)[2,2]
            y:         y:
            x1         x2
y:x1  .17667075
y:x2   .1007703  .11800038

问题:
我想在 sandwich 生态系统中容纳我的计算，这意味着不是手动计算矩阵，而是实际使用 sandwich 函数。是否可以使其与此处描述的长格式模型一起使用？例如，直接提供 meat 和 bread 对象来执行计算？提前致谢。

PS:我注意到 there is a dedicated bread function in sandwich for mlogit ，但我找不到 meat 之类的东西 mlogit ，但无论如何我可能在这里遗漏了一些东西......

最佳答案

为什么 vcovHC 对 mlogit 不起作用
HC 协方差估计器类只能应用于具有单个线性预测器的模型，其中评分函数又名估计函数是所谓的“工作残差”和回归矩阵的乘积。 Zeileis (2006) 论文(参见公式 7)对此进行了详细解释，提供为 vignette("sandwich-OOP", package = "sandwich")在包中。 ?vcovHC也指出了这一点，但没有很好地解释。我在文档中对此进行了改进 http://sandwich.R-Forge.R-project.org/reference/vcovHC.html现在:

The function meatHC is the real work horse for estimating the meat of HC sandwich estimators - the default vcovHC method is a wrapper calling sandwich and bread. See Zeileis (2006) for more implementation details. The theoretical background, exemplified for the linear regression model, is described below and in Zeileis (2004). Analogous formulas are employed for other types of models, provided that they depend on a single linear predictor and the estimating functions can be represented as a product of “working residual” and regressor vector (Zeileis 2006, Equation 7).

这意味着 vcovHC()不适用于多项 logit 模型，因为它们通常对单独的响应类别使用单独的线性预测变量。同样，不支持两部分或跨栏模型等。
基本的“稳健”三明治协方差
通常，为了计算基本的 Eicker-Huber-White 三明治协方差矩阵估计量，最好的策略是使用 sandwich()函数而不是 vcovHC()功能。前者适用于任何带有 estfun() 的模型和 bread()方法。
对于线性模型 sandwich(..., adjust = FALSE) (默认)和 sandwich(..., adjust = TRUE)分别对应于 HC0 和 HC1。在带有 n 的模型中观察和 k前者用 1/n 标准化的回归系数后者是 1/(n-k) .
然而，Stata 除以 1/(n-1)在 logit 模型中，请参阅:
Different Robust Standard Errors of Logit Regression in Stata and R .据我所知，没有明确的理论理由专门使用一种或另一种调整。并且已经在中等大的样本中，这无论如何都没有区别。
备注:与 1/(n-1)的调整在 sandwich() 中不直接可用作为一种选择。但是，巧合的是，它是 vcovCL() 中的默认值不指定 cluster变量(即，将每个观察值视为一个单独的集群)。因此，如果您想获得与 Stata 完全相同的结果，这是一个方便的“技巧”。
聚类协方差
这可以通过 vcovCL(..., cluster = ...)“照常”计算.对于 mlogit模型你只需要考虑 cluster变量只需要提供一次(而不是在长格式中多次堆叠)。
复制Stata结果
使用您帖子中的数据和模型:

vcovCL(mo)
##            x1         x2
## x1 0.23050261 0.09840356
## x2 0.09840356 0.12765662
vcovCL(mo, cluster = df$id_choice[1:15])
##           x1        x2
## x1 0.1766707 0.1007703
## x2 0.1007703 0.1180004

关于r - 三明治+ mlogit : `Error in ef/X : non-conformable arrays` when using `vcovHC()` to compute robust/clustered standard errors，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/66412110/

文章推荐： r - 当方程相互依赖时，替代 for 循环进行快速计算

文章推荐： javascript - 如何在 vue 3 中从 setup 调用 prop 函数

文章推荐： c# - 我的幻灯片拼图的 9 个按钮有问题

R计算具有奇点的lm模型的稳健标准误差(vcovHC)
在 R 中，当某些系数由于奇异性而被丢弃时，如何使用 vcovHC() 计算稳健的标准误差？标准 lm 函数似乎可以很好地计算实际估计的所有系数的正常标准误差，但 vcovHC() 会引发错误:“面包
r - vcovHC 和置信区间
是否可以在拟合模型后使用由 vcovHC(来自三明治包)获得的稳健 vcov？最佳答案不，您不能直接将函数 confint 与强大的 vcov 一起使用。但手动执行此操作非常简单。 x <- si
坚固耐用。 (vcovHC) 在 R 中用 texreg 显示
我正在使用 plm 包进行一些回归，然后如果需要，我还可以获得异方差一致系数。以下是我运行的命令； library(plm) data("Produc", package = "plm")
r - vcovHC::sandwich () 和 coeftest::lmtest() 返回 NA 值
我目前正在构建一个回归模型，该模型有助于使用收入、温度等某些因素来解释销售额。在回归后检查残差图时，残差是异方差的。为了说明异方差，我在 R 中使用了 vcovHC() 和 coeftest()，它
r - 三明治+ mlogit : `Error in ef/X : non-conformable arrays` when using `vcovHC()` to compute robust/clustered standard errors
在离散选择问题中使用 mlogit() 拟合多项式 Logit (MNL) 后，我试图计算稳健/集群标准误差。不幸的是，我怀疑我遇到了问题，因为我使用的是 long 格式的数据(在我的情况下这是必须的

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

r - 三明治+ mlogit : `Error in ef/X : non-conformable arrays` when using `vcovHC()` to compute robust/clustered standard errors