gpt4 book ai didi

r - 从通过散点图拟合的回归线中排除异常值,而不从图中移除异常值

转载 作者:行者123 更新时间:2023-12-04 13:08:10 24 4
gpt4 key购买 nike

我有如下数据,我在下面运行 ggplot 代码:

data <- structure(list(country_mean_rep = structure(c(73.6995708154506, 
93.5501285347044, 85.1529051987768, 91.1017369727047, 79.5562130177515,
84.6751054852321, 89.8, 86.8826405867971, 94.2247191011236, 70.2321428571429,
88.4107142857143), label = "label", format.stata = "%9.2f"),
country_mean_crime = c(0.0944206008583691, 0.0565552699228792,
0.0336391437308868, 0.205955334987593, 0.130177514792899,
0.282700421940928, 0.220512820512821, 0.415647921760391,
0.387640449438202, 0.200892857142857, 0.292207792207792),
country_name = structure(c(1L, 2L, 3L, 4L, 5L, 7L, 11L, 12L,
14L, 16L, 20L), .Label = c("Albania", "Armenia", "Azerbaijan",
"Belarus", "Bosnia and Herzegovina", "Brazil", "Bulgaria",
"Cambodia", "Chile", "CostaRica", "Croatia", "Czech", "Ecuador",
"Estonia", "FYROM", "Georgia", "Germany", "Greece", "Guyana",
"Hungary", "Ireland", "Kazakhstan", "Kenya", "Kyrgyzstan",
"Latvia", "Lithuania", "Malawi", "Mali", "Moldova", "Philippines",
"Poland", "Portugal", "Romania", "Russia", "Senegal", "Serbia&Montenegro",
"Slovakia", "Slovenia", "South Africa", "South Korea", "Spain",
"SriLanka", "Tajikistan", "Turkey", "Ukraine", "Uzbekistan",
"Vietnam"), class = "factor")), row.names = c(NA, -11L), class = c("data.table",
"data.frame"))

# On which I like to run the following code:

ggplot(data, aes(x=country_mean_rep, y=country_mean_crime)) +
geom_point() +
geom_smooth(aes(colour="linear", fill="linear"),
method="lm",
formula=y ~ x, ) +
geom_smooth(aes(colour="quadratic", fill="quadratic"),
method="lm",
formula=y ~ x + I(x^2)) +
geom_smooth(aes(colour="cubic", fill="cubic"),
method="lm",
formula=y ~ x + I(x^2) + I(x^3)) +
labs(colour="Functional Form", fill="Functional Form") +
geom_text(aes(label=country_name), nudge_y=0.02) +
theme_bw()

enter image description here

现在假设捷克共和国是一个离群值,我想将其移除以进行拟合(尤其是线性拟合)。请注意,我知道示例中的捷克共和国没有任何问题,我需要知道这一点才能在我的实际数据中找到适当的异常值。

是否有某种方法可以仅将其从拟合中排除,同时将点保留在图中?

最佳答案

一种方法是包含不同的数据图:

ggplot(subset(data, country_name != 'Czech'), aes(x=country_mean_rep, y=country_mean_crime)) + 
geom_smooth(aes(colour="linear", fill="linear"),
method="lm",
formula=y ~ x, ) +
geom_smooth(aes(colour="quadratic", fill="quadratic"),
method="lm",
formula=y ~ x + I(x^2)) +
geom_smooth(aes(colour="cubic", fill="cubic"),
method="lm",
formula=y ~ x + I(x^2) + I(x^3)) +
labs(colour="Functional Form", fill="Functional Form") +
geom_point(data = data, inherit.aes = FALSE, aes(x = country_mean_rep, y = country_mean_crime)) +
geom_text(data = data, aes(label=country_name, x = country_mean_rep, y = country_mean_crime), inherit.aes = FALSE, nudge_y=0.02) +
theme_bw()

在这种情况下,3 个线性模型使用子集数据,而对 geom_pointgeom_text 的调用不继承原始美学。

关于r - 从通过散点图拟合的回归线中排除异常值,而不从图中移除异常值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68396603/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com