gpt4 book ai didi

r - 选举/人口普查数据的多变量线性回归及由此产生的误差

转载 作者:行者123 更新时间:2023-12-03 00:11:09 27 4
gpt4 key购买 nike

我有这些数据:

library(tidyverse)

df <- tibble(
"racecmb" = c("White", "White", "White", "White", "White", "White",
"White", "White", "Black", "White", "Mixed",
"Black", "White", "White", "White"),
"age" = c(77,74,55,62,60,59,32,91,75,73,43,67,58,18,57),
"income" = c("10 to under $20,000", "100 to under $150,000",
"75 to under $100,000", "75 to under $100,000",
"10 to under $20,000", "20 to under $30,000",
"100 to under $150,000", "20 to under $30,000",
"100 to under $150,000", "20 to under $30,000",
"100 to under $150,000", "Less than $10,000",
"$150,000 or more", " 30 to under $40,000",
"50 to under $75,000"),
"party" = c("Independent", "Independent", "Independent", "Democrat",
"Independent", "Republican", "Independent",
"Independent", "Democrat", "Republican", "Republican",
"Democrat", "Democrat", "Independent", "Independent"),
"ideology" = c("Moderate", "Moderate", "Conservative", "Moderate",
"Moderate", "Very conservative", "Moderate",
"Conservative",
"Conservative", "Moderate", "Conservative",
"Very conservative", "Liberal", "Moderate", "Conservative")
)

我想(已经尝试过)运行一个简单的多元回归:

regression <- lm(party ~ income + ideo + age, data = df) %>%
summary()

我收到此错误:

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
NA/NaN/Inf in 'y'

我的目标是解释一些人投票的方式,但我不知道如何有效地为我的模型编码数据。

如有任何意见/建议,我们将不胜感激...

最佳答案

首先,使用 lm()对于分类变量来说并不理想。您要使用的是 rpart()这将为您提供类别或类的输出,或者您可以使用多项 Logit/Probit 回归来返回给定某些条件下发生结果的概率。

要安装的软件包:rpart 和统计建模

如果您没有分类响应变量,您可以将您的分类变量转换为虚拟变量,然后运行包含虚拟变量的回归(记住保留一个作为基线)。

这可以使用 fastDummies 快速实现封装:

示例: df <- dummy_cols(df, select_columns = "ideology")

如果您的样本量相当大,那么您可能还需要考虑模型中虚拟变量之间的交互作用!

关于r - 选举/人口普查数据的多变量线性回归及由此产生的误差,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51273731/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com