gpt4 book ai didi

r - 指定限制时从 ggplot2 直方图中删除的值

转载 作者:行者123 更新时间:2023-12-04 11:01:02 48 4
gpt4 key购买 nike

我想创建一个 ggplot2 直方图,其中绘图的限制等于数据集中的最小值和最大值,而不从实际直方图中排除这些值。

我在使用基本图形时得到了我正在寻找的行为。具体来说,下面的第二个直方图显示了与第一个直方图相同的所有值(即,第二个直方图中不排除任何 bin),即使我已经包含了 xlim第二个情节的论点:

min_wt <- min(mtcars$wt)
max_wt <- max(mtcars$wt)
xlim <- c(min_wt, max_wt)

hist(mtcars$wt, breaks = 30, main = "No limits added")

hist(mtcars$wt, breaks = 30, xlim = xlim, main = "Limits added")

enter image description here
enter image description here

ggplot2 并没有给我这种行为:
library(ggplot2)

# Using green colour to make dropped bins easy to see:
p <- ggplot(mtcars, aes(x = wt)) + geom_histogram(colour = "green", bins = 30)
p + ggtitle("No limits added")

p + xlim(xlim) + ggtitle("Limits added")

enter image description here
enter image description here

看看在第二个图中我如何失去一个低于 2 的点和 2 个高于 5 的点?我想知道如何解决这个问题。一些杂记:

首先,指定 boundary允许我在直方图中包含最小值(即那些低于 2 的值),但我仍然没有解决大于 5 的 2 个被丢弃的值:
ggplot(mtcars, aes(x = wt)) + 
geom_histogram(bins = 30, colour = "green", boundary = min_wt) +
xlim(xlim) +
ggtitle("Limits added with boundary too")

enter image description here

其次,问题的存在取决于为 bins 选择的值。 .例如,当我增加 bins到 50,我没有得到任何丢弃的值:
ggplot(mtcars, aes(x = wt)) + 
geom_histogram(bins = 50, colour = "green", boundary = min_wt) +
xlim(xlim) +
ggtitle("Limits added with boundary too, but with bins = 50")

enter image description here

最后,我相信这个问题与 SO 上提出的问题有关: geom_histogram: wrong bins?并在此处讨论: https://github.com/tidyverse/ggplot2/issues/1651 .换句话说,我认为这个问题与“舍入误差”有关。我在关于这个问题的第二篇文章(其中显示了图表的文章)中更深入地描述了这个错误: https://github.com/daattali/ggExtra/issues/81 .

这是我的 session 信息:
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.2

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base

other attached packages:
[1] ggplot2_2.2.1

loaded via a namespace (and not attached):
[1] labeling_0.3 colorspace_1.3-2 scales_0.5.0.9000
[4] compiler_3.4.2 lazyeval_0.2.1 plyr_1.8.4
[7] tools_3.4.2 pillar_1.2.1 gtable_0.2.0
[10] tibble_1.4.2 yaml_2.1.16 Rcpp_0.12.15
[13] grid_3.4.2 rlang_0.2.0.9000 munsell_0.4.3

最佳答案

@eipi10 在评论中提到的另一个选项是更改 oob (越界)参数 scale_x_continuous .

Function that handles limits outside of the scale limits (out of bounds). The default replaces out of bounds values with NA.



默认使用 scales::censor() ,您可以将其更改为 oob = scales::squish ,将值压缩到一个范围内。

比较以下两个图。
p + scale_x_continuous(limits = xlim) + ggtitle("default: scales::censor")

warning: Removed 1 rows containing missing values (geom_bar).



enter image description here
p + scale_x_continuous(limits = xlim, oob = scales::squish) + ggtitle("using scales::squish")

enter image description here

您的第三个 ggplot ,您指定了一个边界,但仍然有 2 个大于 5 的值被丢弃,看起来像这样。
ggplot(mtcars, aes(x = wt)) + 
geom_histogram(bins = 30, colour = "green", boundary = min_wt) +
scale_x_continuous(limits = xlim, oob = scales::squish) +
ggtitle("Limits added with boundary too") +
labs(subtitle = "scales::squish")

enter image description here

希望这可以帮助。

关于r - 指定限制时从 ggplot2 直方图中删除的值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49204576/

48 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com