gpt4 book ai didi

r - 使用 R 根据 VIF 标准自动从数据框中删除变量

转载 作者:行者123 更新时间:2023-12-05 02:36:49 25 4
gpt4 key购买 nike

我有一系列数据框,每个数据框代表一个线性模型。我想根据 VIF 标准的阈值 10 自动从每个数据框中删除列。给定的数据框如下所示:

df_nn <- structure(list(capital = c(100, 101, 102, 103, 
104, 105, 106, 107, 108, 109,
110, 111, 112, 113, 114, 115,
116, 117, 118, 119, 120, 121,
122, 123, 124, 125, 126, 127,
128, 129, 130, 131, 132), IVAE = c(109.19,
110.09, 111.84, 112.49, 111.99, 113.11, 111.89, 112.11, 112.75,
113.7, 112.93, 112.43, 114.88, 114.5, 114.93, 115.13, 105.54,
91.71, 87.93, 93.06, 96.74, 103.26, 106.76, 109.6, 110.74, 112,
112.73, 114.97, 115.01, 114.67, 115.78, 114.52, 111.91), `Índice de Producción Industrial (IPI): Industrias Manufactureras, Explotación de Minas y Canteras y Otras Actividades Industriales` = c(101.4,
103.4, 106.72, 108.45, 107.76, 107.25, 105.75, 107.03, 107.31,
106.61, 106.95, 106.61, 110.18, 108.68, 109.66, 111.32, 100.02,
76.77, 73.46, 81.99, 94.83, 100.64, 104.51, 106.74, 107.04, 108.75,
110.8, 110.59, 111.25, 108.82, 110.03, 111.32, 107.61), Construcción = c(112.25,
117.5, 124.32, 122.64, 121.21, 128.69, 122.28, 126.55, 120.13,
137.47, 129.82, 126.83, 132.92, 131.72, 137.56, 130.89, 117.08,
87.62, 67.49, 79.56, 88.97, 117.57, 110.01, 118.02, 117.61, 121.64,
120.76, 120.99, 118.96, 122.7, 122.59, 101.2, 106.3), `Comercio, Transporte y Almacenamiento, Actividades de Alojamiento y de Servicio de Comidas` = c(112.2,
113.03, 115.69, 113.74, 114.7, 115.93, 115.3, 114.25, 115.05,
116.68, 114.84, 114.56, 116.58, 117.77, 119.19, 119.15, 103.41,
76.66, 75.21, 90.32, 91.72, 97.53, 105.21, 110.43, 109.72, 112.41,
114.05, 115.88, 117.29, 115.05, 114.69, 116.79, 109.68), `Actividades Inmobiliarias` = c(113.31,
113.83, 114.69, 114.97, 115.98, 116.2, 116.22, 115.64, 115.79,
115.95, 116.24, 117.6, 117.84, 115.35, 108.98, 105.89, 103.74,
103.16, 102.5, 102.42, 102.41, 104.16, 107.74, 112.87, 116.57,
115.68, 113.47, 112.41, 112.08, 112.42, 112.74, 113.21, 112.56
), `Actividades Profesionales, Científicas, Técnicas, Administrativas, de Apoyo y Otros Servicios` = c(111.84,
111.92, 116.44, 117.77, 112.96, 114.64, 113.67, 112.33, 115.12,
113.31, 114.14, 115.46, 117.17, 120.57, 124.26, 122.68, 99.51,
86.36, 79.21, 81.56, 83.6, 88.71, 97.76, 98.16, 101.04, 102.68,
108.37, 113.64, 114.82, 115.91, 118.35, 118.74, 109.14), empleo = c(851413,
856079, 853309, 854541, 856040, 853881, 853328, 858454, 860200,
861430, 865033, 867569, 874276, 870793, 872645, 876928, 873733,
840029, 813159, 805474, 808920, 814118, 824284, 833293, 841311,
842072, 848832, 854290, 859130, 860833, 865704, 873081, 881033
)), row.names = c(NA, -33L), class = c("tbl_df", "tbl", "data.frame"
))

其中“资本”是因变量,其余列是自变量,均为数字。

到目前为止,我已经为单个数据框尝试了以下函数:

library(car)

vif_fun <- function(df){
while(TRUE) {
vifs <- vif(lm(capital ~. , data = df))
if (max(vifs) < 10) {
break
}
highest <- c(names((which(vifs == max(vifs)))))
df <- df[,-which(names(df) %in% highest)]

}
return(df)
}

vif_fun(df_nn)

只要有一个 VIF 大于 10 的变量,该函数就应该:

  • 找出具有最大VIF的变量
  • 将其从数据框中移除
  • 重复直到不再有 VIF 高于 10 的变量

但是,每当我运行该函数时,都会收到以下错误消息:

Error in terms.formula(formula, data = data) : 
'.' in formula and no 'data' argument

我用 mtcars 数据集尝试了该函数,将函数中的“mpg”替换为“capital”,结果成功了。对可能发生的事情有什么想法吗?

最佳答案

一个更简单的选择是使用 janitor 中的 clean_names 来替换非特定的列名

vif_fun <- function(df){
df <- janitor::clean_names(df)
while(TRUE) {
vifs <- vif(lm(capital ~. , data = df))
if (max(vifs) < 10) {
break
}
highest <- c(names((which(vifs == max(vifs)))))
df <- df[,-which(names(df) %in% highest)]

}
return(df)
}

vif_fun(df_nn)

关于r - 使用 R 根据 VIF 标准自动从数据框中删除变量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70174502/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com