gpt4 book ai didi

r - R:如何从ggplot2中的平滑器中删除异常值?

转载 作者:行者123 更新时间:2023-12-04 04:44:43 26 4
gpt4 key购买 nike

我尝试使用ggplot2绘制以下数据集,它是三个实验A1,B1和C1的时间序列,每个实验都有三个重复项。

我正在尝试添加一个统计信息,该统计信息可在返回更平滑的值(均值和方差?)之前检测并删除离群值。我已经编写了自己的离群值函数(未显示),但是我希望已经有一个函数可以执行此操作,但我还没有找到它。

我已经从ggplot2书中的一些示例中查看了stat_sum_df(“median_hilow”,geom =“smooth”),但我不了解Hmisc的帮助文档是否可以消除异常值。

有没有在ggplot中删除此类异常值的函数,或者我将在下面的代码中修改以添加自己的函数吗?

编辑:我刚刚看到了这个(How to use Outlier Tests in R Code),并注意到Hadley建议使用诸如rlm之类的可靠方法。我正在绘制细菌生长曲线,所以我认为线性模型不是最好的,但是在这种情况下对其他模型或使用或使用健壮模型的任何建议都将不胜感激。

library (ggplot2)  

data = data.frame (day = c(1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7), od =
c(
0.1,1.0,0.5,0.7
,0.13,0.33,0.54,0.76
,0.1,0.35,0.54,0.73
,1.3,1.5,1.75,1.7
,1.3,1.3,1.0,1.6
,1.7,1.6,1.75,1.7
,2.1,2.3,2.5,2.7
,2.5,2.6,2.6,2.8
,2.3,2.5,2.8,3.8),
series_id = c(
"A1", "A1", "A1","A1",
"A1", "A1", "A1","A1",
"A1", "A1", "A1","A1",
"B1", "B1","B1", "B1",
"B1", "B1","B1", "B1",
"B1", "B1","B1", "B1",
"C1","C1", "C1", "C1",
"C1","C1", "C1", "C1",
"C1","C1", "C1", "C1"),
replicate = c(
"A1.1","A1.1","A1.1","A1.1",
"A1.2","A1.2","A1.2","A1.2",
"A1.3","A1.3","A1.3","A1.3",
"B1.1","B1.1","B1.1","B1.1",
"B1.2","B1.2","B1.2","B1.2",
"B1.3","B1.3","B1.3","B1.3",
"C1.1","C1.1","C1.1","C1.1",
"C1.2","C1.2","C1.2","C1.2",
"C1.3","C1.3","C1.3","C1.3"))

> data
day od series_id replicate
1 1 0.10 A1 A1.1
2 3 1.00 A1 A1.1
3 5 0.50 A1 A1.1
4 7 0.70 A1 A1.1
5 1 0.13 A1 A1.2
6 3 0.33 A1 A1.2
7 5 0.54 A1 A1.2
8 7 0.76 A1 A1.2
9 1 0.10 A1 A1.3
10 3 0.35 A1 A1.3
11 5 0.54 A1 A1.3
12 7 0.73 A1 A1.3
13 1 1.30 B1 B1.1
... etc...

这是到目前为止,我的程序运行良好,但是没有删除异常值:
r <- ggplot(data = data, aes(x = day, y = od))
r + geom_point(aes(group = replicate, color = series_id)) + # add points
geom_line(aes(group = replicate, color = series_id)) + # add lines
geom_smooth(aes(group = series_id)) # add smoother, average of each replicate

编辑:我刚刚添加了下面的两个图表,这些图表显示了我从真实数据而不是上面的示例数据中遇到的异常问题的示例。

第一个图显示了p26s4系列,并且在第32天左右的两个重复中确实发生了一些奇怪的事情,显示了2个离群值。

第二幅图显示了p22s5系列,在第18天,那天的读数有些奇怪,我认为这可能是机器错误。

目前,我正在查看数据,以检查增长曲线是否正常。在接受了哈德利的建议并设定了“对称”家庭后,我相信黄土平滑器在忽略异常值方面做得不错。

p26s4 shows around day 32 something really weird went on in two of the replicates, showing 2 outliers

p22s5 shows that on day 18, something weird went on with the reading that day, likely machine error I think

@ Peter/@ hadley,我想做的下一件事是尝试将logistic,gompertz或richard的增长曲线拟合到该数据,而不是黄土,并计算指数级的增长率。最终,我计划在R( http://cran.r-project.org/web/packages/grofit/index.html)中使用grofit程序包,但现在,我想尽可能使用ggplot2手动绘制这些程序。如果您有任何指针,将不胜感激。

最佳答案

您是否尝试过family = "symmetric"geom_smooth参数(该参数随后会传递给loess)?这将使黄土光滑,抗异常值。

但是,查看您的数据,为什么您认为线性拟合不足够?您只有4个x值,而且当然似乎没有强有力的证据表明会偏离线性。

关于r - R:如何从ggplot2中的平滑器中删除异常值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/2612495/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com