gpt4 book ai didi

r - 绘制变量随时间的分布 - 累积加法

转载 作者:行者123 更新时间:2023-12-04 01:02:13 25 4
gpt4 key购买 nike

我正在尝试做一些关于我在 R 方面的专业知识的相当困难的事情。我有一个日期变量,它基本上说明了我的调查的受访者开始它的时间。因此,存在缺失值(非受访者)和日期(受访者)。

我想要绘制的是一个具体变量(例如女性百分比)在我拥有的时间范围内的频率分布。所以,简单地说,一张图表显示第一天 X% 的女性,第二天(包括第一天的人)Y% 等等;对于可用的三个实验组。

我是无知的。我检查了一些以雨或其他自然现象为例的资源,并使用了

cumsum()

ggplot 中的命令来执行它,但似乎不是达到我的目的的方法。我什至不确定我是否需要另一个包裹。

这是示例数据

df <- structure(list(sf_sex = c("Female", "Female", "Female", "Female", 
"Female", "Male", "Female", "Male", "Female", "Female", "Female",
"Female", "Male", "Female", "Male", "Female", "Male", "Male",
"Male", "Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Male", "Male", "Male", "Male", "Female", "Male", "Female",
"Male", "Male", "Male", "Female", "Male", "Female", "Male", "Male",
"Female", "Male", "Female", "Female", "Female", "Male", "Male",
"Female", "Male", "Female", "Female", "Female", "Male", "Male",
"Female", "Male", "Male", "Male", "Female", "Male", "Male", "Female",
"Male", "Male", "Male", "Male", "Female", "Female", "Male", "Female",
"Female", "Female", "Female", "Male", "Female", "Female", "Male",
"Female", "Male", "Male", "Female", "Female", "Male", "Female",
"Male", "Female", "Female", "Male", "Male", "Female", "Male",
"Female", "Male", "Male", "Female", "Male", "Female", "Female",
"Female"), StartDate = c("06/07/2019", "06/06/2019", NA, "05/21/2019",
NA, NA, "05/24/2019", NA, NA, "05/20/2019", NA, "06/04/2019",
NA, NA, NA, NA, "06/16/2019", NA, NA, NA, "05/23/2019", NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "05/23/2019",
NA, NA, NA, NA, NA, NA, NA, NA, NA, "05/22/2019", NA, "06/13/2019",
NA, NA, "05/28/2019", "05/23/2019", NA, NA, NA, NA, NA, NA, "05/29/2019",
"05/22/2019", NA, "05/23/2019", NA, "05/31/2019", NA, "05/22/2019",
NA, "07/02/2019", "06/02/2019", NA, NA, "05/27/2019", NA, NA,
NA, "05/27/2019", NA, NA, NA, NA, "06/04/2019", "05/22/2019",
NA, NA, "05/24/2019", NA, "05/25/2019", "05/21/2019", "05/20/2019",
NA, NA, "05/24/2019", NA, NA, "06/03/2019", "05/22/2019", "05/20/2019"
)), row.names = c(2L, 9L, 12L, 23L, 24L, 38L, 48L, 49L, 52L,
53L, 55L, 68L, 71L, 75L, 84L, 90L, 107L, 114L, 115L, 117L, 118L,
122L, 125L, 134L, 138L, 144L, 148L, 163L, 169L, 179L, 185L, 188L,
199L, 206L, 209L, 211L, 223L, 227L, 230L, 233L, 234L, 237L, 241L,
243L, 247L, 257L, 269L, 275L, 277L, 284L, 287L, 288L, 291L, 292L,
295L, 301L, 310L, 314L, 316L, 324L, 329L, 331L, 333L, 338L, 341L,
344L, 363L, 365L, 372L, 373L, 375L, 385L, 400L, 401L, 411L, 416L,
421L, 423L, 427L, 429L, 439L, 440L, 443L, 444L, 455L, 465L, 468L,
479L, 504L, 511L, 518L, 522L, 528L, 529L, 530L, 538L, 541L, 542L,
543L, 554L), class = "data.frame")

并且 NA 案例应该被忽略,因为这是没有参与的人。

抱歉,如果数据占用太多空间,非常感谢您的帮助。

最佳答案

这是一个精心制定的问题!

这是我的解决方案,并附有评论以进行一些解释——如果有任何不清楚的地方,请告诉我。

df %>% 
# convert StartDate from character to something sort-able
mutate(date = lubridate::mdy(StartDate)) %>%
arrange(date) %>%
# get numerator and denominator of proportion female by date x
mutate(Rs = cumsum(sf_sex %in% c("Male", "Female")),
female_Rs = cumsum(sf_sex == "Female")) %>%
# take last observation per date
group_by(date) %>%
slice(n()) %>%
select(date, Rs, female_Rs) %>%
# make the proportion
mutate(female_prop = female_Rs/Rs) %>%
# plot it over time
ggplot(aes(x = date, y = female_prop)) +
geom_point() +
geom_line()

enter image description here

关于r - 绘制变量随时间的分布 - 累积加法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67884262/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com