Why is the cumulative line above the histogram incorrect with `ggplot2`?(为什么`ggplot2`直方图上方的累积线不正确？)-6ren

Why is the cumulative line above the histogram incorrect with `ggplot2`?(为什么`ggplot2`直方图上方的累积线不正确？)

转载作者：bug小助手更新时间：2023-10-25 19:45:31

I want to overlay the cumulative share of a column with ggplot but the percentages are incorrect

我想用gglot覆盖列的累计份额，但百分比不正确

Indeed, you can see that the red line starts at 95% whereas the first bin is below at around 82%

事实上，你可以看到红线从95%开始，而第一个垃圾桶在下面，大约是82%

df <- structure(list(col = c(1.42221064814815, 0.709669201387851, 0.00864583333333333,  3.35221946759356, 0.0138087731489429, 0.101736111111111, 0.000459247684037244,  0.0291767592590164, 0.171842569443915, 0.171538472222509, 0.0708449074074074,  0.0234837962962963, 0.25262748842714, 0.386477071758774, 125.055696030094,  0.0696409606492078, 0.0938078703703704, 0.192905092592593, 0.0031709722208756,  0.227335300925705, 0.0134506134247338, 0.040787037037037, 0.266623020834393,  0.00225040509193032, 0.473669131944577, 0.130830208333554, 3.61516203703704,  0.130288240741248, 0.536915474536794, 0.00138538194475351, 0.0113888888888889,  3.26379307870236, 0.12810640046166, 0.392849537037589, 0.71517319444429,  0.112205289351167, 0.431413553241226, 0.0178086342579789, 2.69385361110999,  0.220277777777778, 0.00206320601756926, 0.0808217592592593, 0.13211086805496,  1.90881438657365, 2.04585710648033, 0.845706018518518, 0.0741087962962963,  0.428182499999249, 0.00403622685207261, 0.0592311111120162, 0.0682201851849203,  1.24485666666594, 0.0189236111111111, 0.0453356481481481, 7.11538414351918,  0.0155092592592593, 0.0541087962962963, 0.0759213078711872, 0.00378994212934264,  0.00767912037118717, 0.0622061574072749, 22.5055494907416, 0.0707319328713196,  0.0851041666666667, 0.285934664353176, 0.0116175694432524, 0.709232141204454,  1.05187328703701, 0.0052125925929458, 0.112268171296627, 0.0400231481481481,  0.0341140393526466, 0.225503703703483, 0.0834027777777778, 0.929739918981989,  0.403400393517481, 0.0825652893522271, 0.458994571759745, 0.07600548611195,  0.0985681712958548, 0.0385900578703041, 0.359117986110074, 0.922757222221957,  186.031066087962, 2.39154376157456, 0.499594907407407, 0.0130671296296296,  2.86927083333333, 0.00584490740740741, 0.619270625001302, 0.0142964004642434,  0.0854832175925926, 1.39854887731373, 1.51077546296296, 0.00819540509195239,  0.750400266203063, 233.781311967594, 0.340315266204653, 0.879955011573103,  2.82027777777778)), row.names = c(NA, -100L), class = "data.frame")

library(ggplot); library(dplyr)

df %>%
  ggplot(aes(x = col)) +
  geom_histogram(aes(y = after_stat(cumsum(count / sum(count)))), breaks = 0:max(df$col, na.rm = T), binwidth = 1, fill = "blue", color = "black") +
  geom_line(stat = "bin", aes(y = after_stat(cumsum(count / sum(count)))), color = "red") +
  scale_y_continuous(labels = scales::percent) +
  coord_cartesian(xlim=c(0, 10), ylim=c(0, 1)) +
  scale_x_continuous(breaks = seq(0, 10, by = 1)) +
  scale_y_continuous(breaks = seq(0, 1, by = 0.1), labels = scales::percent)

更多回答

优秀答案推荐

The issue is that you use the default number of bins (=30) for geom_line, i.e. the binwidth is computed as diff(range(x)) / 30 while for geom_col you have set the binwidth=1.

问题是，对于geom_line，您使用默认的bin数量(=30)，即，binwidth的计算公式为diff(range(X))/30，而对于geom_ol，您已经将binwidth设置为1。

I you want the same counts then you have to use the same binning for both layers.

如果你想要相同的计数，那么你必须对两个层使用相同的装箱。

Additionally note that for your case there is no need to set the breaks. You could use boundary= or center= to set the starting position for the bins. For geom_line it's a bit more involved. Here I use stage to shift the x position of the line after the stat has been applied. But you could also use position = position_nudge(x = -.5) to achieve the same result.

此外，请注意，对于您的情况，不需要设置中断。您可以使用BORDARY=或CENTER=来设置垃圾箱的起始位置。对于geom_line，它稍微复杂一些。在这里，我使用Stage在应用统计信息之后移动线的x位置。但您也可以使用POSITION=POSITION_NUSH(x=-.5)来实现相同的结果。

library(ggplot2)

df |>
  ggplot(aes(x = col)) +
  geom_histogram(
    aes(y = after_stat(cumsum(count / sum(count)))),
    binwidth = 1, fill = "blue", color = "black",
    boundary = 0
  ) +
  geom_line(
    stat = "bin",
    aes(
      x = stage(col, after_stat = x - .5),
      y = after_stat(cumsum(count / sum(count)))
    ),
    color = "red",
    binwidth = 1,
    boundary = 0
  ) +
  coord_cartesian(xlim = c(0, 10), ylim = c(0, 1)) +
  scale_x_continuous(breaks = seq(0, 10, by = 1)) +
  scale_y_continuous(
    breaks = seq(0, 1, by = 0.1),
    labels = scales::percent
  )

EDIT I you want the line to start at (0, 0) then the easiest way would be to switch to geom_freqpoly which by default extends the line:

编辑I如果希望行从(0，0)开始，则最简单的方法是切换到geom_freqpoly，这在默认情况下会扩展行：

library(ggplot2)
library(dplyr, warn = FALSE)

df |>
  ggplot(aes(x = col)) +
  geom_histogram(
    aes(y = after_stat(cumsum(count / sum(count)))),
    binwidth = 1, fill = "blue", color = "black",
    boundary = 0
  ) +
  geom_freqpoly(
    aes(
      x = stage(col, after_stat = x + .5),
      y = after_stat(cumsum(count / sum(count)))
    ),
    binwidth = 1, color = "red",
    boundary = 0
  ) +
  coord_cartesian(xlim = c(0, 10), ylim = c(0, 1)) +
  scale_x_continuous(breaks = seq(0, 10, by = 1)) +
  scale_y_continuous(
    breaks = seq(0, 1, by = 0.1),
    labels = scales::percent
  )

更多回答

Is it possible that the line start at the point (0, 0)? Such that the first non-zero point will cross the right-corner of the first bin

这条线可能从(0，0)点开始吗？使得第一个非零点将穿过第一个面元的右角

Yup. You could switch to geom_freqpoly. Basically a histogram which uses a line instead of bars.

是的。您可以切换到geom_freqpoly。基本上是使用线条而不是线条的直方图。

文章推荐： Instructions on git [closed](有关GIT的说明[已关闭])

Python ggplot 和 ggplotly
前 R 用户，我曾经通过 ggplotly() 函数广泛地结合 ggplot 和 plot_ly 库来显示数据。刚到 Python 时，我看到 ggplot 库可用，但在与 plotly 的简单组合
r - ggplotly 从 ggplot 中删除图例
ggplotly 使用 ggplot 删除 geom_line 图的图例。见例如以下: library(plotly) g % ggplotly() 关于r - ggplotly 从 gg
r - 设置带有端点的 ggplot 网格线/ggplot 的中断计算
我有一个 ggplot我试图以非常简约的外观制作线图的问题。我已经摆脱了图例，转而使用每行右侧的文本标签。如果标签不是那么长，它可能不会那么明显，但如果网格线停在最大 x 值(在这种情况下，在 201
r - 在一个 ggplot() 中生成多个 ggplot 图形
我想使用相同的 ggplot 代码以我的数据框中的数字为条件生成 8 个不同的数字。通常我会使用 facet_grid，但在这种情况下，我希望最终得到每个单独数字的 pdf。例如，我想要这里的每一行一
r - ggplot : conflict between geom_text and ggplot(fill)
当我在 ggplot 上使用 geom_text 时，与 ggplot 的“填充”选项发生冲突。这是问题的一个明显例子: library(ggplot2) a=ChickWeight str(a)
r - 将 ggplotly 和 ggplot 与拼凑而成？
是否可以结合使用 ggplot ly 和拼凑而成的ggplot？例子这将并排显示两个图 library(ggplot2) library(plotly) library(patchwork) a
r - ggplot、ggplotly、scale_y_连续、ylim 和百分比
我想绘制一个图表，其中 y 轴以百分比表示: p = ggplot(test, aes(x=creation_date, y=value, color=type)) + geom_line(aes
R ggplot，删除 ggsave/ggplot 中的白边
如何去除ggsave中的白边距？我的问题和Remove white space (i.e., margins) ggplot2 in R一模一样。然而，那里的答案对我来说并不理想。我不想对固定但未知
r - 文本层在 ggplot 中工作，但用 ggplotly 删除
我有一个带有一些文本层的条形图，在 ggplot 库中一切正常，但现在我想添加一些与 ggplotly 的交互性，但它无法显示文本层我更新了所有软件包但问题仍然存在 df = read.table(
r - ggplot 到 ggplotly 不适用于自定义的 geom_boxplot 宽度
当我尝试在 ggplot 中为我的箱线图设置自定义宽度时，它工作正常: p=ggplot(iris, aes(x = Species,y=Sepal.Length )) + geom_boxplot(
r - 如何通过从 ggplot 中的不同数据帧映射 aes_string 在 ggplot 中生成图例？
我正在尝试为 ggplot 密度创建一个图例，将一个组与所有组进行比较。使用此示例 - R: Custom Legend for Multiple Layer ggplot - 我可以使用下面的代码成
r - ggplot 在多面图上有一些错误。尝试使用多面 ggplot 协调 y 值
所以我试图在一个多面的 ggplot 上编辑 y 值，因为我在编织时在情节上有几个不准确之处。我对 R 和 R Markdown 很陌生，所以我不太明白为什么，例如，美国的 GDP PPP 在美元金额
python-ggplot - 如何在 Python Ggplot 上格式化 x 轴？
我需要在 python 条形图的 x 轴 ggplot 上格式化日期。我该怎么做？最佳答案使用 scale_x_date() 格式化 x 轴上的日期。 p = ggplot(aes(x='dat
r - 为什么 ggplotly 在 rmarkdown 中不能像 ggplot 一样工作
我想使用 ggplotly因为它的副作用相同ggplot甚至graphics做。我的意思是当我 knitr::knit或 rmarkdown::render我期望的 Rmd 文档 print(obj)
r - 在 Shiny 的应用程序中显示 ggplot 时，如何捕获控制台中出现的 ggplot 警告并显示在应用程序中？
我在下面有一个简单的应用程序，它显示了一个 ggplot。 ggplot 在控制台中生成警告(见底部图片)。我想捕获警告，并将其显示在应用程序的情节下方。这是我的代码: library(shiny)
r - 在 Shiny 的应用程序中缓存基本 ggplot 并允许动态修改图层(与 ggplot 等效的leafletProxy)
如果显示的基本数据集很大(下面的示例工作代码)，则在 Shiny 的应用程序中向/从 ggplot 添加/删除图层可能需要一段时间。问题是: 有没有办法缓存 ggplot(基本图)并添加/删除/修改
r - ggplot 和网格 : Find the relative x and y positions of a point in a ggplot grob
我正在组合 ggplot 的多个绘图，使用网格视口(viewport)，这是必要的(我相信)，因为我想旋转绘图，这在标准 ggplot 中是不可能的，甚至可能是 gridExtra 包。我想在两个图
R中的相对频率直方图，ggplot
我可以使用 lattice 在 R 中绘制相对频率直方图包裹: a <- runif(100) library(lattice) histogram(a) 我想在 ggplot 中获得相同的图形.我试
ggplot geom_area的R堆叠区域顺序
我需要重新安装 R，但我现在遇到了 ggplot 的一个小问题。我确信有一个简单的解决方案，我感谢所有提示! 我经常使用堆叠面积图，通常我通过定义因子水平并以相反的顺序绘制来获得所需的堆叠和图例顺序。
ggplot 中的数据重新排序
新的并且坚持使用ggplot: 我有以下数据: tribe rho preference_watermass 1 Luna2 -1.000 hypolimnic 2 OP10I-A1

bug小助手

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

Why is the cumulative line above the histogram incorrect with `ggplot2`?(为什么`ggplot2`直方图上方的累积线不正确？)