gpt4 book ai didi

r - 用固定的 r2 模拟逻辑回归的数据

转载 作者:行者123 更新时间:2023-12-04 12:27:43 26 4
gpt4 key购买 nike

我想模拟逻辑回归的数据,我可以在其中预先指定其解释方差。看看下面的代码。我模拟了四个自变量,并指定每个 logit 系数的大小应为 log(2)=0.69。这很有效,解释方差(我报告 Cox & Snell 的 r2)为 0.34。

但是,我需要以一种预先指定的 r2 将从回归中产生的方式指定回归系数。因此,如果我想生成一个 r2,我们可以说正好是 0.1。需要如何指定系数?我有点挣扎于此..

# Create independent variables
sigma.1 <- matrix(c(1,0.25,0.25,0.25,
0.25,1,0.25,0.25,
0.25,0.25,1,0.25,
0.25,0.25,0.25,1),nrow=4,ncol=4)
mu.1 <- rep(0,4)
n.obs <- 500000

library(MASS)
sample1 <- as.data.frame(mvrnorm(n = n.obs, mu.1, sigma.1, empirical=FALSE))

# Create latent continuous response variable
sample1$ystar <- 0 + log(2)*sample1$V1 + log(2)*sample1$V2 + log(2)*sample1$V3 + log(2)*sample1$V4

# Construct binary response variable
sample1$prob <- exp(sample1$ystar) / (1 + exp(sample1$ystar))
sample1$y <- rbinom(n.obs,size=1,prob=sample1$prob)

# Logistic regression
logreg <- glm(y ~ V1 + V2 + V3 + V4, data=sample1, family=binomial)
summary(logreg)

输出是:
Call:
glm(formula = y ~ V1 + V2 + V3 + V4, family = binomial, data = sample1)

Deviance Residuals:
Min 1Q Median 3Q Max
-3.7536 -0.7795 -0.0755 0.7813 3.3382

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.002098 0.003544 -0.592 0.554
V1 0.691034 0.004089 169.014 <2e-16 ***
V2 0.694052 0.004088 169.776 <2e-16 ***
V3 0.693222 0.004079 169.940 <2e-16 ***
V4 0.699091 0.004081 171.310 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 693146 on 499999 degrees of freedom
Residual deviance: 482506 on 499995 degrees of freedom
AIC: 482516

Number of Fisher Scoring iterations: 5

Cox 和 Snell 的 r2 给出:
library(pscl)
pR2(logreg)["r2ML"]

> pR2(logreg)["r2ML"]
r2ML
0.3436523

最佳答案

如果您向 ystar 变量添加一个随机误差项,生成 ystat.r,然后使用它,您可以调整标准偏差,直到它符合您的规范。

sample1$ystar.r <- sample1$ystar+rnorm(n.obs, 0, 3.8)  # tried a few values
sample1$prob <- exp(sample1$ystar.r) / (1 + exp(sample1$ystar.r))
sample1$y <- rbinom(n.obs,size=1,prob=sample1$prob)
logreg <- glm(y ~ V1 + V2 + V3 + V4, data=sample1, family=binomial)
summary(logreg) # the estimates "shrink"
pR2(logreg)["r2ML"]
#-------
r2ML
0.1014792

关于r - 用固定的 r2 模拟逻辑回归的数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49300320/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com