gpt4 book ai didi

r - 根据组为直方图着色时防止错误的密度图

转载 作者:行者123 更新时间:2023-12-04 17:18:44 25 4
gpt4 key购买 nike

根据一些虚拟数据,我创建了一个带有密度图的直方图

set.seed(1234)
wdata = data.frame(
sex = factor(rep(c("F", "M"), each=200)),
weight = c(rnorm(200, 55), rnorm(200, 58))
)
a <- ggplot(wdata, aes(x = weight))

a + geom_histogram(aes(y = ..density..,
# color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
# aes(color = sex)
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))

Basic Result

weight 的直方图应对应于sex着色,所以我使用 aes(y = ..density.., color = sex)对于 geom_histogram() :

a + geom_histogram(aes(y = ..density..,
color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
# aes(color = sex)
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))

Scaled individual histograms (not desired)

如我所愿,密度图保持不变(两组的总体),但直方图按比例放大(现在似乎被单独处理):

如何防止这种情况发生?我需要单独着色的直方图条,但需要所有着色组的联合密度图。

附言使用 aes(color = sex)对于 geom_density()将所有内容恢复到原始比例 - 但我不想要单独的密度图(如下所示):

a + geom_histogram(aes(y = ..density..,
color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
aes(color = sex)
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))

Individual densities (not desired)

编辑:

正如建议的那样,除以 geom_histogram() 中的组数与y = ..density../2的美学可以近似解。然而,这仅适用于对称分布,如下面的第一个输出所示:

a + geom_histogram(aes(y = ..density../2,
color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))

产生

Solution

但是,不太对称的分布可能会导致使用此方法出现问题。请参阅下面的内容,其中 5 组,y = ..density../5被使用了。首先是原始的,然后是操纵的(使用 position = "stack" ): Original

Divided by 5

由于左边的分布很重,除以 5 左边低估,右边高估。

编辑 2:解决方案

按照 Andrew 的建议,下面的(完整的)代码解决了这个问题:

library(ggplot2)
set.seed(1234)
wdata = data.frame(
sex = factor(rep(c("F", "M"), each = 200)),
weight = c(rnorm(200, 55), rnorm(200, 58))
)

binwidth <- 0.25
a <- ggplot(wdata,
aes(x = weight,
# Pass binwidth to aes() so it will be found in
# geom_histogram()'s aes() later
binwidth = binwidth))

# Basic plot w/o colouring according to 'sex'
a + geom_histogram(aes(y = ..density..),
binwidth = binwidth,
colour = "black",
fill = "white",
position = "stack") +
geom_density(alpha = 0.2) +
scale_color_manual(values = c("#868686FF", "#EFC000FF")) +
# Use fixed scale for sake of comparability
scale_x_continuous(limits = c(52, 61)) +
scale_y_continuous(limits = c(0, 0.25))


# Plot w/ colouring according to 'sex'
a + geom_histogram(aes(x = weight,
# binwidth will only be found if passed to
# ggplot()'s aes() (as above)
y = ..count.. / (sum(..count..) * binwidth),
color = sex),
binwidth = binwidth,
fill="white",
position = "stack") +
geom_density(alpha = 0.2) +
scale_color_manual(values = c("#868686FF", "#EFC000FF")) +
# Use fixed scale for sake of comparability
scale_x_continuous(limits = c(52, 61)) +
scale_y_continuous(limits = c(0, 0.25)) +
guides(color = FALSE)

注意: binwidth = binwidth需要传递给 ggplot()aes() ,否则预先指定 binwidth geom_histogram() 找不到的 aes() .此外,position = "stack"被指定,因此直方图的两个版本是可比较的。虚拟数据图和下面更复杂的分布:

Correct, ungrouped, simple data

Correct, grouped, simple data

Correct, ungrouped, more complex distribution

Correct, grouped, more complex distribution

已解决 - 感谢您的帮助!

最佳答案

我不认为你可以使用 y=..density.. 来做到这一点,但你可以像这样重新创建同样的东西......

binwidth <- 0.25 #easiest to set this manually so that you know what it is

a + geom_histogram(aes(y = ..count.. / (sum(..count..) * binwidth),
color = sex),
binwidth = binwidth,
fill="white",
position = "identity") +
geom_density(alpha = 0.2) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))

enter image description here

关于r - 根据组为直方图着色时防止错误的密度图,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54352106/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com