gpt4 book ai didi

r - lm : complete cases used even with predictors without missing data 中缺少数据行为

转载 作者:行者123 更新时间:2023-12-02 11:16:01 25 4
gpt4 key购买 nike

我的问题:删除带有 NA 的预测变量并考虑排除该预测变量的完整案例的最有效方法是什么?

问题源自以下 NA 的回归情况,其中 Ozone(大部分)和 Solar.R 中缺少值>.

data(airquality)
summary(airquality)
# Ozone Solar.R Wind Temp Month
# Min. : 1.00 Min. : 7.0 Min. : 1.700 Min. :56.00 Min. :5.000
# 1st Qu.: 18.00 1st Qu.:115.8 1st Qu.: 7.400 1st Qu.:72.00 1st Qu.:6.000
# Median : 31.50 Median :205.0 Median : 9.700 Median :79.00 Median :7.000
# Mean : 42.13 Mean :185.9 Mean : 9.958 Mean :77.88 Mean :6.993
# 3rd Qu.: 63.25 3rd Qu.:258.8 3rd Qu.:11.500 3rd Qu.:85.00 3rd Qu.:8.000
# Max. :168.00 Max. :334.0 Max. :20.700 Max. :97.00 Max. :9.000
# NA's :37 NA's :7
# Day
# Min. : 1.0
# 1st Qu.: 8.0
# Median :16.0
# Mean :15.8
# 3rd Qu.:23.0
# Max. :31.0

Wind 对其余变量的回归。仅考虑完整的案例。

summary(lm(Wind ~ ., data = airquality))
#
# Call:
# lm(formula = Wind ~ ., data = airquality)
#
# Residuals:
# Min 1Q Median 3Q Max
# -4.3908 -2.2800 -0.3078 1.4132 9.6501
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 15.519460 2.918393 5.318 5.96e-07 ***
# Ozone -0.060746 0.011798 -5.149 1.23e-06 ***
# Solar.R 0.003791 0.003216 1.179 0.241
# Temp -0.036604 0.044576 -0.821 0.413
# Month -0.159671 0.208082 -0.767 0.445
# Day 0.017353 0.031238 0.556 0.580
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 2.822 on 105 degrees of freedom
# (42 observations deleted due to missingness)
# Multiple R-squared: 0.3994, Adjusted R-squared: 0.3708
# F-statistic: 13.96 on 5 and 105 DF, p-value: 1.857e-10

如果Ozone被删除,仍然只考虑完整的情况(包括Ozone)。但这与手动删除Ozone不同。

summary(lm(Wind ~ . - Ozone, data = airquality))
#
# Call:
# lm(formula = Wind ~ . - Ozone, data = airquality)
#
# Residuals:
# Min 1Q Median 3Q Max
# -6.012 -2.323 -0.361 1.493 9.605
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 24.3159074 2.6354288 9.227 3.09e-15 ***
# Solar.R 0.0009228 0.0035281 0.262 0.794
# Temp -0.1900820 0.0369159 -5.149 1.21e-06 ***
# Month 0.0313046 0.2280600 0.137 0.891
# Day 0.0008969 0.0346116 0.026 0.979
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 3.143 on 106 degrees of freedom
# (42 observations deleted due to missingness)
# Multiple R-squared: 0.2477, Adjusted R-squared: 0.2193
# F-statistic: 8.727 on 4 and 106 DF, p-value: 3.961e-06

summary(lm(Wind ~ Solar.R + Temp + Wind + Month + Day, data = airquality))
#
# Call:
# lm(formula = Wind ~ Solar.R + Temp + Wind + Month + Day, data = airquality)
#
# Residuals:
# Min 1Q Median 3Q Max
# -8.1779 -2.2063 -0.2757 1.9448 9.3510
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 23.660271 2.416766 9.790 < 2e-16 ***
# Solar.R 0.002980 0.003113 0.957 0.340
# Temp -0.186386 0.032725 -5.695 6.89e-08 ***
# Month 0.074952 0.206334 0.363 0.717
# Day -0.011028 0.030304 -0.364 0.716
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 3.158 on 141 degrees of freedom
# (7 observations deleted due to missingness)
# Multiple R-squared: 0.2125, Adjusted R-squared: 0.1901
# F-statistic: 9.511 on 4 and 141 DF, p-value: 7.761e-07

最佳答案

Wind ~ 确实是不幸且令人惊讶的。 - Ozone 在查找完整案例时会考虑 Ozone;如果您想继续下去,似乎值得在 r-devel@r-project.org 邮件列表上进行讨论。与此同时,怎么样

 summary(lm(Wind ~ ., data = subset(airquality, select=-Ozone))

关于r - lm : complete cases used even with predictors without missing data 中缺少数据行为,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47753117/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com