gpt4 book ai didi

R ggplot2直方图覆盖每个直方图的归一化值

转载 作者:行者123 更新时间:2023-12-05 07:36:52 25 4
gpt4 key购买 nike

我想创建一个比较三组的直方图。但是,我想通过每组内的计数总数而不是计数总数对每个直方图进行归一化。这是我的代码。

library(ggplot2)
library(reshape2)
# Creates dataset
set.seed(9)
df<- data.frame(values = c(runif(400,20,50),runif(300,40,80),runif(600,0,30)),labels = c(rep("med",400),rep("high",300),rep("low",600)))

levs <- c("low", "med", "high")
df$labels <- factor(df$labels, levels = levs)

ggplot(df, aes(x=values, fill=labels)) +
geom_histogram(aes(y=..density..),
breaks= seq(0, 80, by = 2),
alpha=0.2,
position="identity")

这会生成一个直方图,该直方图似乎已按密度归一化。 enter image description here

但是,我决定根据我对该密度的手动验证来交叉检查此密度图。为此,我使用了以下代码:

# Separates the low medium and high groups
df1 <- df[df$labels == "low",]
df2 <- df[df$labels == "med",]
df3 <- df[df$labels == "high",]

# creates histogram for each group that is normalized by the total number of counts
hist_temp <- hist(df1$values, breaks=seq(0,80, by=2))
tdf <- data.frame(hist_temp$breaks[2:length(hist_temp$breaks)],hist_temp$counts)
colnames(tdf) <- c("bins","counts")
tdf$norm <- tdf$counts/(sum(tdf$counts))
low1 <- tdf

hist_temp <- hist(df2$values, breaks=seq(0,80, by=2))
tdf <- data.frame(hist_temp$breaks[2:length(hist_temp$breaks)],hist_temp$counts)
colnames(tdf) <- c("bins","counts")
tdf$norm <- tdf$counts/(sum(tdf$counts))
med1 <- tdf

hist_temp <- hist(df3$values, breaks=seq(0,80, by=2))
tdf <- data.frame(hist_temp$breaks[2:length(hist_temp$breaks)],hist_temp$counts)
colnames(tdf) <- c("bins","counts")
tdf$norm <- tdf$counts/(sum(tdf$counts))
high1 <- tdf

# Combines normalized histograms for each data frame and melts them into a single vector for plotting
Tdata <- data.frame(low1$bins,low1$norm,med1$norm,high1$norm)
colnames(Tdata) <- c("bin","low", "med", "high")
Tdata<- melt(Tdata,id = "bin")

levs <- c("low", "med", "high")
Tdata$variable <- factor(Tdata$variable, levels = levs)

# Plot the data
ggplot(Tdata, aes(group=variable, colour= variable)) +
geom_line(aes(x = bin, y = value))

生成: enter image description here

如您所见,它们之间存在很大差异,但我不明白为什么。他们两个的 Y 轴应该相同,但事实并非如此。因此,假设我没有犯一些愚蠢的数学错误,我相信我希望直方图看起来像折线图,但我想不出实现这一点的方法。感谢您提供任何帮助,并在此先感谢您。


编辑以添加更多无效示例:

我也尝试过在这段代码中使用 ..count../(sum(..count..)) 方法:

# Histogram where each histogram is divided by the total count of all groups    
ggplot(df, aes(x=values, fill=labels, group=labels)) +
geom_histogram(aes(y=(..count../sum(..count..))),
breaks= seq(0, 80, by = 2),
alpha=0.2,
position="identity")

结果如下: enter image description here

这只是归一化为所有直方图的总数。这也不反射(reflect)我在线图中看到的内容。此外,我尝试用 ..ncount.. 代替 ..count..(在分子、分母以及分子和分母中),这也不会重新创建折线图中显示的结果。

此外,我尝试使用“position=stack”而不是使用以下代码的标识:

    ggplot(df, aes(x=values, fill=labels, group=labels)) + 
geom_histogram(aes(y=..density..),
breaks= seq(0, 80, by = 2),
alpha=0.2,
position="stack")

得到这个结果: enter image description here

这也没有反射(reflect)折线图中显示的值。


取得了进展!使用 this post by Joran 中概述的方法我现在可以生成与折线图相同的直方图。下面是代码:

# Plot where each histogram is normalized by its own counts.  
ggplot(df,aes(x=values, fill=labels, group=labels)) +
geom_histogram(data=subset(df, labels == 'high'),
aes(y=(..count../sum(..count..))),
breaks= seq(0, 80, by = 2),
alpha = 0.2) +
geom_histogram(data=subset(df, labels == 'med'),
aes(y=(..count../sum(..count..))),
breaks= seq(0, 80, by = 2),
alpha = 0.2) +
geom_histogram(data=subset(df, labels == 'low'),
aes(y=(..count../sum(..count..))),
breaks= seq(0, 80, by = 2),
alpha = 0.2) +
scale_fill_manual(values = c("blue","red","green"))

生成此图: enter image description here

但是,我仍然无法重新排序数据,因此图例显示为“低”然后“中”然后“高”,而不是按字母顺序排列。我已经设定了因素的水平。 (请参阅第一段代码)。有什么想法吗?

最佳答案

要对每个类别使用计数,也许 position="stack"

ggplot(df, aes(x=values, fill=labels)) + 
geom_histogram(aes(y=..density..),
breaks= seq(0, 80, by = 2),
alpha=0.4,
position="stack") +
geom_density(alpha=.2, position="stack")

它给了我这个 distribution ,但似乎仍然与您的第二个情节不同。

关于R ggplot2直方图覆盖每个直方图的归一化值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48922858/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com