r - 使用调查权重时，如何为 logit 模型生成边际效应？-6ren

r - 使用调查权重时，如何为 logit 模型生成边际效应？

转载作者：行者123 更新时间：2023-12-04 11:39:56

我通常使用 mfx 包和 logitmfx 函数生成 logit 模型边际效应。但是，我使用的当前调查具有权重(由于某些人群的过度采样，这对样本中 DV 的比例有很大影响)并且 logitmfx 似乎没有任何方法可以包含权重。

我用 svyglm 拟合模型如下:

library(survey)

survey.design <- svydesign(ids = combined.survey$id,
                                        weights = combined.survey$weight,
                                            data = combined.survey)

vote.pred.1 <- svyglm(formula = turnout ~ gender + age.group + 
                                    education + income, 
                                 design = survey.design)
summary(vote.pred.1)

如何从这些结果中产生边际效应？

最佳答案

我有同样的问题。下面我修改了 mfx 包中的一个函数，以使用组织为调查对象的数据来计算边际效应。我没有做太多，主要是替换了“mean()”和类似的命令，这些命令旨在用调查包等效项来运行非调查数据。修改后的mfx代码，有运行示例的代码。

背景

Alan Fernihough 关于 mfx 包的详细信息:
https://cran.r-project.org/web/packages/mfx/mfx.pdf

github上mfx包的代码(我修改的文件是probitmfxest.r和probitmfx.r):
https://github.com/cran/mfx/tree/master/R

在 mfx 计算器中，我注释掉了许多内置于原始函数中的灵活性，这些函数处理关于集群和稳健 SE 的不同假设。我可能是错的，但我认为使用调查包中的回归估计命令 svyglm() 已经解决了这些问题。

边际效应计算器

 library(survey)

 probitMfxEstSurv <-
    function(formula, 
             design, 
             atmean = TRUE, 
             robust = FALSE, 
             clustervar1 = NULL, 
             clustervar2 = NULL, 
             start = NULL
             #           control = list() # this option is found in the original mfx package
    ){

      if(is.null(formula)){
        stop("model formula is missing")
      }

      for( i in 1:length(class(design))){
        if(!((class(design)[i] %in% "survey.design2") | (class(design)[i] %in% "survey.design"))){
          stop("design arguement must contain survey object")
        }
      }

      # from Fernihough's original mfx function
      # I dont think this is needed because the  
      # regression computed by the survey package should
      # take care of stratification and robust SEs
      # from the survey info
      # 
      #     # cluster sort part
      #     if(is.null(clustervar1) & !is.null(clustervar2)){
      #       stop("use clustervar1 arguement before clustervar2 arguement")
      #     }    
      #     if(!is.null(clustervar1)){
      #       if(is.null(clustervar2)){
      #         if(!(clustervar1 %in% names(data))){
      #           stop("clustervar1 not in data.frame object")
      #         }    
      #         data = data.frame(model.frame(formula, data, na.action=NULL),data[,clustervar1])
      #         names(data)[dim(data)[2]] = clustervar1
      #         data=na.omit(data)
      #       }
      #       if(!is.null(clustervar2)){
      #         if(!(clustervar1 %in% names(data))){
      #           stop("clustervar1 not in data.frame object")
      #         }    
      #         if(!(clustervar2 %in% names(data))){
      #           stop("clustervar2 not in data.frame object")
      #         }    
      #         data = data.frame(model.frame(formula, data, na.action=NULL),
      #                           data[,c(clustervar1,clustervar2)])
      #         names(data)[c(dim(data)[2]-1):dim(data)[2]] = c(clustervar1,clustervar2)
      #         data=na.omit(data)
      #       }
      #     }

      # fit the probit regression
      fit = svyglm(formula, 
                   design=design, 
                   family = quasibinomial(link = "probit"), 
                   x=T
      )
      # TS: summary(fit)

      # terms needed
      x1 = model.matrix(fit)
      if (any(alias <- is.na(coef(fit)))) {   # this conditional removes any vars with a NA coefficient
        x1 <- x1[, !alias, drop = FALSE]
      }

      xm = as.matrix(svymean(x1,design)) # calculate means of x variables
      be = as.matrix(na.omit(coef(fit))) # collect coefficients: be as in beta
      k1 = length(na.omit(coef(fit))) # collect number of coefficients or x variables
      xb = t(xm) %*% be # get the matrix product of xMean and beta, which is the model prediction at the mean
      fxb = ifelse(atmean==TRUE, dnorm(xb), mean(dnorm(x1 %*% be))) # collect either the overall predicted mean, or the average of every observation's predictions

      # get variances
      vcv = vcov(fit)

      # from Fernihough's original mfx function
      # I dont think this is needed because the  
      # regression computed by the survey package should
      # take care of stratification and robust SEs
      # from the survey info
      # 
      #     if(robust){
      #       if(is.null(clustervar1)){
      #         # white correction
      #         vcv = vcovHC(fit, type = "HC0")
      #       } else {
      #         if(is.null(clustervar2)){
      #           vcv = clusterVCV(data=data, fm=fit, cluster1=clustervar1,cluster2=NULL)
      #         } else {
      #           vcv = clusterVCV(data=data, fm=fit, cluster1=clustervar1,cluster2=clustervar2)
      #         }
      #       }
      #     }
      #     
      #     if(robust==FALSE & is.null(clustervar1)==FALSE){
      #       if(is.null(clustervar2)){
      #         vcv = clusterVCV(data=data, fm=fit, cluster1=clustervar1,cluster2=NULL)
      #       } else {
      #         vcv = clusterVCV(data=data, fm=fit, cluster1=clustervar1,cluster2=clustervar2)
      #       }
      #     }

      # set mfx equal to predicted mean (or other value) multiplied by beta
      mfx = data.frame(mfx=fxb*be, se=NA)

      # get standard errors
      if(atmean){#    fxb *  id matrix - avg model prediction * (beta X xmean)
        gr = as.numeric(fxb)*(diag(k1) - as.numeric(xb) *(be %*% t(xm)))
        mfx$se = sqrt(diag(gr %*% vcv %*% t(gr)))            
      } else {
        gr = apply(x1, 1, function(x){
          as.numeric(as.numeric(dnorm(x %*% be))*(diag(k1) - as.numeric(x %*% be)*(be %*% t(x))))
        })
        gr = matrix(apply(gr,1,mean),nrow=k1)
        mfx$se = sqrt(diag(gr %*% vcv %*% t(gr)))                
      }

      # pick out constant and remove from mfx table
      temp1 = apply(x1,2,function(x)length(table(x))==1)
      const = names(temp1[temp1==TRUE])
      mfx = mfx[row.names(mfx)!=const,]

      # pick out discrete change variables
      temp1 = apply(x1,2,function(x)length(table(x))==2)
      disch = names(temp1[temp1==TRUE])

      # calculate the disctrete change marginal effects and standard errors
      if(length(disch)!=0){
        for(i in 1:length(disch)){
          if(atmean){
            disx0 = disx1 = xm
            disx1[disch[i],] = max(x1[,disch[i]])
            disx0[disch[i],] = min(x1[,disch[i]])
            # mfx equal to    prediction @ x=1     minus prediction @ x=0
            mfx[disch[i],1] = pnorm(t(be) %*% disx1) - pnorm(t(be) %*% disx0)
            # standard errors
            gr = dnorm(t(be) %*% disx1) %*% t(disx1) - dnorm(t(be) %*% disx0) %*% t(disx0)
            mfx[disch[i],2] = sqrt(gr %*% vcv %*% t(gr))
          } else {
            disx0 = disx1 = x1
            disx1[,disch[i]] = max(x1[,disch[i]])
            disx0[,disch[i]] = min(x1[,disch[i]])  
            mfx[disch[i],1] = mean(pnorm(disx1 %*% be) - pnorm(disx0 %*% be))
            # standard errors
            gr = as.numeric(dnorm(disx1 %*% be)) * disx1 - as.numeric(dnorm(disx0 %*% be)) * disx0
            avegr = as.matrix(colMeans(gr))
            mfx[disch[i],2] = sqrt(t(avegr) %*% vcv %*% avegr)
          }
        }
      } 
      mfx$discretechgvar = ifelse(rownames(mfx) %in% disch, 1, 0)
      output = list(fit=fit, mfx=mfx)
      return(output)
    }



  probitMfxSurv <-
    function(formula, 
             design, 
             atmean = TRUE, 
             robust = FALSE, 
             clustervar1 = NULL, 
             clustervar2 = NULL, 
             start = NULL 
             #           control = list() # this option is found in original mfx package
    )
    {
      #    res = probitMfxEstSurv(formula, design, atmean, robust, clustervar1, clustervar2, start, control)
      res = probitMfxEstSurv(formula, design, atmean, robust, clustervar1, clustervar2, start)

      est = NULL
      est$mfxest = cbind(dFdx = res$mfx$mfx,
                         StdErr = res$mfx$se,
                         z.value = res$mfx$mfx/res$mfx$se,
                         p.value = 2*pt(-abs(res$mfx$mfx/res$mfx$se), df = Inf))
      colnames(est$mfxest) = c("dF/dx","Std. Err.","z","P>|z|")
      rownames(est$mfxest) =  rownames(res$mfx)

      est$fit = res$fit
      est$dcvar = rownames(res$mfx[res$mfx$discretechgvar==1,])  
      est$call = match.call() 
      class(est) = "probitmfx"
      est
    }

示例

  # initialize sample data
  nObs = 100
  x1 = rbinom(nObs,1,.5)
  x2 = rbinom(nObs,1,.3)
  #x3 = rbinom(100,1,.9)
  x3 = runif(nObs,0,.9)

  id = 1:nObs
  w1 = sample(c(10,50,100),nObs,replace=TRUE)
  #   dependnt variables
  ystar = x1 + x2 - x3 + rnorm(nObs)
  y = ifelse(ystar>0,1,0)
  #   set up data frame
  data = data.frame(id, w1, x1, x2, x3, ystar, y)

  # initialize survey
  survey.design <- svydesign(ids = data$id,
                             weights = data$w1,
                             data = data)

  mean(data$x2)
  sd(data$x2)/(length(data$x2))^0.5
  svymean(x=x2,design=survey.design)

  probit = svyglm(y~x1 + x2 + x3, design=survey.design, family=quasibinomial(link='probit'))
  summary(probit)

  probitMfxSurv(formula = y~x1 + x2 + x3, design = survey.design)

关于r - 使用调查权重时，如何为 logit 模型生成边际效应？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/26468360/

文章推荐： r - 是否需要从基本包中显式导入 roxygen？

文章推荐： angularjs - 在另一个指令中渲染一个指令(在转发器模板中)

文章推荐： r - 如何使用 R 在 GLM 中指定负二项式误差分布？

r - 多项 logit 模型和嵌套 logit 模型
我在程序 R 中使用 mlogit 包。我已将数据从原始宽格式转换为长格式。这是转换后的 data.frame 的示例，我将其称为“long_perp”。所有的自变量都是个体特定的。我在数据集中有 4
python - 极值的 logit 和反 logit 函数
我需要 logit 和反 logit 函数，以便 logit(inv_logit(n)) == n。我使用 numpy，这是我所拥有的: import numpy as np def logit(p)
python - tensorflow /keras : "logits and labels must have the same first dimension" How to squeeze logits or expand labels?
我正在尝试制作一个简单的 CNN 分类器模型。对于我的训练图像 (BATCH_SIZEx227x227x1) 和标签 (BATCH_SIZEx7) 数据集，我使用 numpy ndarray，它们通过
python - 在不计算整个句子的情况下估计给定句子的标记概率/logits
我有这样一句话:"I like sitting in my new chair and _____ about life" . 我有一组特定的 token ，如 ["watch", "run", "t
statistics - logit 模型中的尺度参数
在仔细阅读 logit 模型说明时，我遇到了一种叫做“尺度参数”的可能性。有人可以解释一下这是什么以及它的用途。不使用它会发生什么。另外，它也用于概率模型吗？干杯最佳答案这是 5 个月前提出的问
regression - Logit 模型和逻辑回归之间的区别？
我知道这两个模型有不同的方程，但我不确定为什么人们使用 logistic 模型而不是 logit 模型，反之亦然？其背后的主要原因是什么？如果我的响应变量是一个决策变量(是，否)，那么这里哪个模型会更
python - 评分统计模型 Logit
我似乎无法弄清楚对逻辑回归模型进行评分的语法。 logit = sm.Logit(data[response],sm.add_constant(data[features])) model = log
python - Logits 的形状错误
当我尝试使用 softmax 交叉熵函数时，我收到一个 ValueError 消息 ValueError: Rank mismatch: Rank of labels (received 2) sho
python - Logits 和标签的大小必须相同
我正在尝试创建一个神经网络，该网络一次从多个 csv 文件中获取 13 个特征作为输入，并在每次迭代后测量准确性。这是我的代码片段: import tensorflow as tf import nu
python - logits 和标签必须是可广播的
我已经启动了tensorflow，并尝试实现简单的神经网络，并识别来自analyticsvidhya.com的数字练习问题，并遵循以下帖子: https://www.analyticsvidhya.c
python - 计算多项 logit 模型预测概率
请尝试给出参数化解决方案(有三个以上的选择)。我有一个带有 beta 值的字典: {'B_X1': 2.0, 'B_X2': -3.0} 这个数据框: X1_123 X1_456 X1_789
R 如何获得多项 logit 的置信区间？
让我使用 UCLA 的多项 logit 示例作为运行示例--- library(nnet) library(foreign) ml <- read.dta("http://www.ats.ucla.e
python - Tensorflow:Logits 和标签必须具有相同的第一维
我是 TF 机器学习新手。我生成了这个数据集并将其导出到 .csv 文件中。它在这里:tftest.csv . “分布”列对应于一个独特的方程组，我试图将其压缩为 SageMath 中的一系列数字。
python - tensorflow 稀疏分类交叉熵与 logits
我是一名新手程序员，试图关注this指导。但是，我遇到了一个问题。该指南说将损失函数定义为: def loss(labels, logits): return tf.keras.losses.
python - 在 statsmodels.logit 中将协方差类型更改为稳健
在 python 中使用统计模型进行逻辑回归时，我试图将协方差类型从非稳健更改为稳健。我阅读了 statsmodels.org 上的文档，但无法找到有关如何执行此操作的解决方案。如果这里有人可以帮
r - 使用调查权重时，如何为 logit 模型生成边际效应？
我通常使用 mfx 包和 logitmfx 函数生成 logit 模型边际效应。但是，我使用的当前调查具有权重(由于某些人群的过度采样，这对样本中 DV 的比例有很大影响)并且 logitmfx 似乎
neural-network - Tensorflow:具有交叉熵的缩放 logits
在 Tensorflow 中，我有一个分类器网络和不平衡的训练类。由于各种原因，我不能使用重采样来补偿不平衡的数据。因此，我不得不通过其他方式来补偿失衡，特别是根据每个类中的示例数量将 logits
python - Keras - 如何获得非规范化的 logits 而不是概率
我正在 Keras 中创建一个模型并想计算我自己的指标(困惑度)。这需要使用非标准化概率/logits。然而，keras 模型只返回 softmax 概率: model = Sequential()
r - R 中对一组不同解释变量的并行面板 logit 计算
我是 R 并行计算的初学者。我遇到了 doParallel 包，我认为它对我的情况可能有用。以下代码旨在并行评估多个 pglm 回归: require("foreach") require("doP
r - 如何使用多项 logit 模型的标准误差获得平均边际效应 (AME)？
我想获得具有标准误差的多项式 logit 模型的平均边际效应 (AME)。为此，我尝试了不同的方法，但到目前为止还没有达到目标。最好的尝试我最好的尝试是使用 mlogit 手动获取 AME。我在下

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

r - 使用调查权重时，如何为 logit 模型生成边际效应？