gpt4 book ai didi

r - geom_density2d 的重量当量

转载 作者:行者123 更新时间:2023-12-04 01:12:49 25 4
gpt4 key购买 nike

考虑以下数据:

   contesto       x       y perc
1 M01 81.370 255.659 22
2 M02 85.814 242.688 16
3 M03 73.204 240.526 33
4 M04 66.478 227.916 46
5 M04a 67.679 218.668 15
6 M05 59.632 239.325 35
7 M06 64.316 252.777 23
8 M08 90.258 227.676 45
9 M09 100.707 217.828 58
10 M10 89.829 205.278 53
11 M11 114.998 216.747 15
12 M12 119.922 235.482 18
13 M13 129.170 239.205 36
14 M14 142.501 229.717 24
15 M15 76.206 213.144 24
16 M16 30.090 166.785 33
17 M17 130.731 219.989 56
18 M18 74.885 192.336 36
19 M19 48.823 142.645 32
20 M20 48.463 186.361 24
21 M21 74.765 205.698 16

我想为由 perc 加权的点 x 和 y 创建一个二维密度图。我可以通过使用 rep 来做到这一点(虽然我认为不正确)如下:
library(ggplot2)

dataset2 <- with(dataset, dataset[rep(1:nrow(dataset), perc),])

ggplot(dataset2, aes(x, y)) +
stat_density2d(aes(alpha=..level.., fill=..level..), size=2,
bins=10, geom="polygon") +
scale_fill_gradient(low = "yellow", high = "red") +
scale_alpha(range = c(0.00, 0.5), guide = FALSE) +
geom_density2d(colour="black", bins=10) +
geom_point(data = dataset) +
guides(alpha=FALSE) + xlim(c(10, 160)) + ylim(c(120, 280))

enter image description here

这似乎不是正确的方法,因为其他 geom s 允许权重如下:
dat <- as.data.frame(ftable(mtcars$cyl))
ggplot(dat, aes(x=Var1)) + geom_bar(aes(weight=Freq))

但是,如果我尝试在此处使用权重,则该图与数据不匹配(desc 被忽略):
ggplot(dataset, aes(x, y)) + 
stat_density2d(aes(alpha=..level.., fill=..level.., weight=perc),
size=2, bins=10, geom="polygon") +
scale_fill_gradient(low = "yellow", high = "red") +
scale_alpha(range = c(0.00, 0.5), guide = FALSE) +
geom_density2d(colour="black", bins=10, aes(weight=perc)) +
geom_point(data = dataset) +
guides(alpha=FALSE) + xlim(c(10, 160)) + ylim(c(120, 280))

enter image description here

这是 rep的用途吗?加权密度的正确方法,或者是否有类似于 weight 的更好方法 geom_bar 的论据?
rep方法看起来像用基本 R 制作的内核密度,所以我认为它应该是这样的:

enter image description here
dataset <- structure(list(contesto = structure(1:21, .Label = c("M01", "M02", 
"M03", "M04", "M04a", "M05", "M06", "M08", "M09", "M10", "M11",
"M12", "M13", "M14", "M15", "M16", "M17", "M18", "M19", "M20",
"M21"), class = "factor"), x = c(81.37, 85.814, 73.204, 66.478,
67.679, 59.632, 64.316, 90.258, 100.707, 89.829, 114.998, 119.922,
129.17, 142.501, 76.206, 30.09, 130.731, 74.885, 48.823, 48.463,
74.765), y = c(255.659, 242.688, 240.526, 227.916, 218.668, 239.325,
252.777, 227.676, 217.828, 205.278, 216.747, 235.482, 239.205,
229.717, 213.144, 166.785, 219.989, 192.336, 142.645, 186.361,
205.698), perc = c(22, 16, 33, 46, 15, 35, 23, 45, 58, 53, 15,
18, 36, 24, 24, 33, 56, 36, 32, 24, 16)), .Names = c("contesto",
"x", "y", "perc"), row.names = c(NA, -21L), class = "data.frame")

最佳答案

我认为你做得对,如果你的权重是每个坐标(或按比例)的 # 个观察值。该函数似乎期待所有观察结果,如果您在原始数据集上调用它,则无法动态更新 ggplot 对象,因为它已经对密度进行了建模,并且包含派生的绘图数据。

您可能想使用 data.table而不是 with()如果您的真实数据集很大,它会快 70 倍左右。例如请参阅此处了解 1m 坐标,重复 1-20 次(在此示例中为 >10m 观测值)。但是,与 660 次观察没有性能相关性(无论如何,该图可能会成为大型数据集的性能瓶颈)。

bigtable<-data.frame(x=runif(10e5),y=runif(10e5),perc=sample(1:20,10e5,T))

system.time(rep.with.by<-with(bigtable, bigtable[rep(1:nrow(bigtable), perc),]))
#user system elapsed
#11.67 0.18 11.92

system.time(rep.with.dt<-data.table(bigtable)[,list(x=rep(x,perc),y=rep(y,perc))])
#user system elapsed
#0.12 0.05 0.18

# CHECK THEY'RE THE SAME
sum(rep.with.dt$x)==sum(rep.with.by$x)
#[1] TRUE

# OUTPUT ROWS
nrow(rep.with.dt)
#[1] 10497966

关于r - geom_density2d 的重量当量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21273525/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com