gpt4 book ai didi

r - R中具有分类变量的线性模型

转载 作者:行者123 更新时间:2023-12-04 12:25:44 25 4
gpt4 key购买 nike

我正在尝试使用一些分类变量拟合线性模型

model <- lm(price ~ carat+cut+color+clarity)
summary(model)

答案是:
Call:
lm(formula = price ~ carat + cut + color + clarity)

Residuals:
Min 1Q Median 3Q Max
-11495.7 -688.5 -204.1 458.2 9305.3

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3696.818 47.948 -77.100 < 2e-16 ***
carat 8843.877 40.885 216.311 < 2e-16 ***
cut.L 755.474 68.378 11.049 < 2e-16 ***
cut.Q -349.587 60.432 -5.785 7.74e-09 ***
cut.C 200.008 52.260 3.827 0.000131 ***
cut^4 12.748 42.642 0.299 0.764994
color.L 1905.109 61.050 31.206 < 2e-16 ***
color.Q -675.265 56.056 -12.046 < 2e-16 ***
color.C 197.903 51.932 3.811 0.000140 ***
color^4 71.054 46.940 1.514 0.130165
color^5 2.867 44.586 0.064 0.948729
color^6 50.531 40.771 1.239 0.215268
clarity.L 4045.728 108.363 37.335 < 2e-16 ***
clarity.Q -1545.178 102.668 -15.050 < 2e-16 ***
clarity.C 999.911 88.301 11.324 < 2e-16 ***
clarity^4 -665.130 66.212 -10.045 < 2e-16 ***
clarity^5 920.987 55.012 16.742 < 2e-16 ***
clarity^6 -712.168 52.346 -13.605 < 2e-16 ***
clarity^7 1008.604 45.842 22.002 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1167 on 4639 degrees of freedom
Multiple R-squared: 0.9162, Adjusted R-squared: 0.9159
F-statistic: 2817 on 18 and 4639 DF, p-value: < 2.2e-16

但我不明白为什么答案是“.L,.Q,.C,^4, ...”,有问题但我不知道有什么问题,我已经尝试过使用函数因子每个变量。

最佳答案

您会遇到回归函数如何处理“有序”(序数)因子变量,并且默认的一组对比是高达 n-1 次的正交多项式对比,其中 n 是该因子的级别数。解释这个结果不会很容易……特别是如果没有自然秩序。即使有,并且在这种情况下很可能有,您可能不想要默认排序(按因子级别按字母顺序排列),并且您可能不希望多项式对比中的度数超过几个。

在 ggplot2 的钻石数据集的情况下,因子级别设置正确,但大多数新手在偶然发现有序因子时会得到诸如“Excellent”<“Fair”<“Good”<“Poor”之类的有序级别。 (失败)

> levels(diamonds$cut)
[1] "Fair" "Good" "Very Good" "Premium" "Ideal"
> levels(diamonds$clarity)
[1] "I1" "SI2" "SI1" "VS2" "VS1" "VVS2" "VVS1" "IF"
> levels(diamonds$color)
[1] "D" "E" "F" "G" "H" "I" "J"

在正确设置有序因子后使用有序因子的一种方法是将它们包装在 as.numeric 中,这样可以对趋势进行线性测试。
> contrasts(diamonds$cut) <- contr.treatment(5) # Removes ordering
> model <- lm(price ~ carat+cut+as.numeric(color)+as.numeric(clarity), diamonds)
> summary(model)

Call:
lm(formula = price ~ carat + cut + as.numeric(color) + as.numeric(clarity),
data = diamonds)

Residuals:
Min 1Q Median 3Q Max
-19130.3 -696.1 -176.8 556.9 9599.8

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5189.460 36.577 -141.88 <2e-16 ***
carat 8791.452 12.659 694.46 <2e-16 ***
cut2 909.433 35.346 25.73 <2e-16 ***
cut3 1129.518 32.772 34.47 <2e-16 ***
cut4 1156.989 32.427 35.68 <2e-16 ***
cut5 1264.128 32.160 39.31 <2e-16 ***
as.numeric(color) -318.518 3.282 -97.05 <2e-16 ***
as.numeric(clarity) 522.198 3.521 148.31 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1227 on 53932 degrees of freedom
Multiple R-squared: 0.9054, Adjusted R-squared: 0.9054
F-statistic: 7.371e+04 on 7 and 53932 DF, p-value: < 2.2e-16

关于r - R中具有分类变量的线性模型,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30159162/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com