gpt4 book ai didi

r - R中的两级最小二乘

转载 作者:行者123 更新时间:2023-12-04 11:14:22 24 4
gpt4 key购买 nike

我想在R中进行两阶段概率最小二乘回归。有人知道该怎么做吗?那里有什么包裹吗?我知道可以使用Stata来做到这一点,所以我想可以使用R来做到这一点。

最佳答案

当您说“两级概率最小二乘”时,您可能想更具体一些。由于您是指实现该目标的Stata程序,所以我猜您正在谈论CDSIMEQ软件包,该软件包为Heckit模型(也称为通用Tobit,也称为Tobit II型模型等)实现了Amemiya(1978)程序。正如Grant所说,systemfit会为您做一个Tobit,但不会用两个方程式。 MicEcon软件包确实有一个Heckit(但是该软件包 split 了很多次,我不知道它现在在哪里)。

如果您需要CDSIMEQ的功能,则可以在R中轻松实现。我编写了一个复制CDSIMEQ的函数:

tspls <- function(formula1, formula2, data) {
# The Continous model
mf1 <- model.frame(formula1, data)
y1 <- model.response(mf1)
x1 <- model.matrix(attr(mf1, "terms"), mf1)

# The dicontionous model
mf2 <- model.frame(formula2, data)
y2 <- model.response(mf2)
x2 <- model.matrix(attr(mf2, "terms"), mf2)

# The matrix of all the exogenous variables
X <- cbind(x1, x2)
X <- X[, unique(colnames(X))]

J1 <- matrix(0, nrow = ncol(X), ncol = ncol(x1))
J2 <- matrix(0, nrow = ncol(X), ncol = ncol(x2))
for (i in 1:ncol(x1)) J1[match(colnames(x1)[i], colnames(X)), i] <- 1
for (i in 1:ncol(x2)) J2[match(colnames(x2)[i], colnames(X)), i] <- 1

# Step 1:
cat("\n\tNOW THE FIRST STAGE REGRESSION")
m1 <- lm(y1 ~ X - 1)
m2 <- glm(y2 ~ X - 1, family = binomial(link = "probit"))
print(summary(m1))
print(summary(m2))

yhat1 <- m1$fitted.values
yhat2 <- X %*% coef(m2)

PI1 <- m1$coefficients
PI2 <- m2$coefficients
V0 <- vcov(m2)
sigma1sq <- sum(m1$residuals ^ 2) / m1$df.residual
sigma12 <- 1 / length(y2) * sum(y2 * m1$residuals / dnorm(yhat2))

# Step 2:
cat("\n\tNOW THE SECOND STAGE REGRESSION WITH INSTRUMENTS")

m1 <- lm(y1 ~ yhat2 + x1 - 1)
m2 <- glm(y2 ~ yhat1 + x2 - 1, family = binomial(link = "probit"))
sm1 <- summary(m1)
sm2 <- summary(m2)
print(sm1)
print(sm2)

# Step 3:
cat("\tNOW THE SECOND STAGE REGRESSION WITH CORRECTED STANDARD ERRORS\n\n")
gamma1 <- m1$coefficients[1]
gamma2 <- m2$coefficients[1]

cc <- sigma1sq - 2 * gamma1 * sigma12
dd <- gamma2 ^ 2 * sigma1sq - 2 * gamma2 * sigma12
H <- cbind(PI2, J1)
G <- cbind(PI1, J2)

XX <- crossprod(X) # X'X
HXXH <- solve(t(H) %*% XX %*% H) # (H'X'XH)^(-1)
HXXVXXH <- t(H) %*% XX %*% V0 %*% XX %*% H # H'X'V0X'XH
Valpha1 <- cc * HXXH + gamma1 ^ 2 * HXXH %*% HXXVXXH %*% HXXH

GV <- t(G) %*% solve(V0) # G'V0^(-1)
GVG <- solve(GV %*% G) # (G'V0^(-1)G)^(-1)
Valpha2 <- GVG + dd * GVG %*% GV %*% solve(XX) %*% solve(V0) %*% G %*% GVG

ans1 <- coef(sm1)
ans2 <- coef(sm2)

ans1[,2] <- sqrt(diag(Valpha1))
ans2[,2] <- sqrt(diag(Valpha2))
ans1[,3] <- ans1[,1] / ans1[,2]
ans2[,3] <- ans2[,1] / ans2[,2]
ans1[,4] <- 2 * pt(abs(ans1[,3]), m1$df.residual, lower.tail = FALSE)
ans2[,4] <- 2 * pnorm(abs(ans2[,3]), lower.tail = FALSE)

cat("Continuous:\n")
print(ans1)
cat("Dichotomous:\n")
print(ans2)
}

为了进行比较,我们可以将CDSIMEQ作者的样本复制到他们的 article about the package中。
> library(foreign)
> cdsimeq <- read.dta("http://www.stata-journal.com/software/sj3-2/st0038/cdsimeq.dta")
> tspls(continuous ~ exog3 + exog2 + exog1 + exog4,
+ dichotomous ~ exog1 + exog2 + exog5 + exog6 + exog7,
+ data = cdsimeq)

NOW THE FIRST STAGE REGRESSION
Call:
lm(formula = y1 ~ X - 1)

Residuals:
Min 1Q Median 3Q Max
-1.885921 -0.438579 -0.006262 0.432156 2.133738

Coefficients:
Estimate Std. Error t value Pr(>|t|)
X(Intercept) 0.010752 0.020620 0.521 0.602187
Xexog3 0.158469 0.021862 7.249 8.46e-13 ***
Xexog2 -0.009669 0.021666 -0.446 0.655488
Xexog1 0.159955 0.021260 7.524 1.19e-13 ***
Xexog4 0.316575 0.022456 14.097 < 2e-16 ***
Xexog5 0.497207 0.021356 23.282 < 2e-16 ***
Xexog6 -0.078017 0.021755 -3.586 0.000352 ***
Xexog7 0.161177 0.022103 7.292 6.23e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.6488 on 992 degrees of freedom
Multiple R-squared: 0.5972, Adjusted R-squared: 0.594
F-statistic: 183.9 on 8 and 992 DF, p-value: < 2.2e-16


Call:
glm(formula = y2 ~ X - 1, family = binomial(link = "probit"))

Deviance Residuals:
Min 1Q Median 3Q Max
-2.49531 -0.59244 0.01983 0.59708 2.41810

Coefficients:
Estimate Std. Error z value Pr(>|z|)
X(Intercept) 0.08352 0.05280 1.582 0.113692
Xexog3 0.21345 0.05678 3.759 0.000170 ***
Xexog2 0.21131 0.05471 3.862 0.000112 ***
Xexog1 0.45591 0.06023 7.570 3.75e-14 ***
Xexog4 0.39031 0.06173 6.322 2.57e-10 ***
Xexog5 0.75955 0.06427 11.818 < 2e-16 ***
Xexog6 0.85461 0.06831 12.510 < 2e-16 ***
Xexog7 -0.16691 0.05653 -2.953 0.003152 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1386.29 on 1000 degrees of freedom
Residual deviance: 754.14 on 992 degrees of freedom
AIC: 770.14

Number of Fisher Scoring iterations: 6


NOW THE SECOND STAGE REGRESSION WITH INSTRUMENTS
Call:
lm(formula = y1 ~ yhat2 + x1 - 1)

Residuals:
Min 1Q Median 3Q Max
-2.32152 -0.53160 0.04886 0.53502 2.44818

Coefficients:
Estimate Std. Error t value Pr(>|t|)
yhat2 0.257592 0.021451 12.009 <2e-16 ***
x1(Intercept) 0.012185 0.024809 0.491 0.623
x1exog3 0.042520 0.026735 1.590 0.112
x1exog2 0.011854 0.026723 0.444 0.657
x1exog1 0.007773 0.028217 0.275 0.783
x1exog4 0.318636 0.028311 11.255 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7803 on 994 degrees of freedom
Multiple R-squared: 0.4163, Adjusted R-squared: 0.4128
F-statistic: 118.2 on 6 and 994 DF, p-value: < 2.2e-16


Call:
glm(formula = y2 ~ yhat1 + x2 - 1, family = binomial(link = "probit"))

Deviance Residuals:
Min 1Q Median 3Q Max
-2.49610 -0.58595 0.01969 0.59857 2.41281

Coefficients:
Estimate Std. Error z value Pr(>|z|)
yhat1 1.26287 0.16061 7.863 3.75e-15 ***
x2(Intercept) 0.07080 0.05276 1.342 0.179654
x2exog1 0.25093 0.06466 3.880 0.000104 ***
x2exog2 0.22604 0.05389 4.194 2.74e-05 ***
x2exog5 0.12912 0.09510 1.358 0.174544
x2exog6 0.95609 0.07172 13.331 < 2e-16 ***
x2exog7 -0.37128 0.06759 -5.493 3.94e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1386.29 on 1000 degrees of freedom
Residual deviance: 754.21 on 993 degrees of freedom
AIC: 768.21

Number of Fisher Scoring iterations: 6

NOW THE SECOND STAGE REGRESSION WITH CORRECTED STANDARD ERRORS

Continuous:
Estimate Std. Error t value Pr(>|t|)
yhat2 0.25759209 0.1043073 2.46955009 0.01369540
x1(Intercept) 0.01218500 0.1198713 0.10165068 0.91905445
x1exog3 0.04252006 0.1291588 0.32920764 0.74206810
x1exog2 0.01185438 0.1290754 0.09184073 0.92684309
x1exog1 0.00777347 0.1363643 0.05700519 0.95455252
x1exog4 0.31863627 0.1367881 2.32941597 0.02003661
Dichotomous:
Estimate Std. Error z value Pr(>|z|)
yhat1 1.26286574 0.7395166 1.7076909 0.0876937093
x2(Intercept) 0.07079775 0.2666447 0.2655134 0.7906139867
x2exog1 0.25092561 0.3126763 0.8025092 0.4222584495
x2exog2 0.22603717 0.2739307 0.8251618 0.4092797527
x2exog5 0.12911922 0.4822986 0.2677163 0.7889176766
x2exog6 0.95609385 0.2823662 3.3860070 0.0007091758
x2exog7 -0.37128221 0.3265478 -1.1369920 0.2555416141

关于r - R中的两级最小二乘,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/3843882/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com