gpt4 book ai didi

r - 如何按小时计算一段时间内的平均值?

转载 作者:行者123 更新时间:2023-12-01 11:35:43 24 4
gpt4 key购买 nike

我是 R 的新手,遇到了我的第一个困难。我有一个 ca.10000 obs 的数据集。我捕捉事件发生的 365 天。仅在每个月的前 14 天标记出这种情况。我想通过对相应月份之前发生的事件(按小时)进行平均来补充额外的 16 天。

结构如下:

                    day           hours      occurrence
2000-01-01 1 5
2000-01-01 2 6
2000-01-01 3 7
... ... ...
2000-01-01 23 3
2000-01-01 24 2
... ... ...
2000-01-02 1 4
2000-01-02 2 2
2000-01-02 3 5
... ... ...
2000-01-02 23 2
2000-01-02 24 1
...
...
2000-01-15 1 average of the previous 1 hours((5+4+n)/2*k))
2000-01-15 2 average of the previous 2 hours ((6+2+n)/2*k))
2000-01-15 3 average of the previous 3 hours((7+5+n)/2*k))
... ... ...
2000-01-15 23 average of the previous 23 hours
2000-01-15 24 average of the previous 24 hours
... ... ...
... ... ...
2000-01-30
2000-01-30
2000-01-30
2000-01-30
... ... ...
... ... ...
2000-02-01
2000-02-01
2000-02-01
2000-02-01
... ... ...
...
... ... ...
2000-12-24

我试过了

               aggregate( occurences ~ hours, mean) 

但结果毫无意义,我试过了

               tapply( X = occurences, INDEX = list(hours), FUN = Mean )

不幸的是,两者都没有像我想象的那样工作。我认为有必要将相应的月份包含在函数中。然而我的手段似乎有限。

最佳答案

你可以试试这个。请注意,为了使示例更小,我只选择每月第 1-4 天和第 0-1 小时的数据。每个月的第 1 天和第 2 天都有发生数据,第 2 天和第 3 天缺少发生数据。

library(dplyr)

# create dummy data
set.seed(123) # for reproducibility of sample

d1 <- data.frame(time = seq(from = as.POSIXct("2000-01-01"),
to = as.POSIXct("2000-02-28"),
by = "hour"))
d1 <- d1 %>%
mutate(hour = as.integer(format(time, "%H")),
day = as.integer(format(time, "%d")), # <~~ only needed to generate sample data
month = as.integer(format(time, "%m")),
occurence = sample(1:10, length(time), replace = TRUE),
occurence = ifelse(day %in% 1:2, occurence, NA)) %>% # <~~~ data only for day 1-2
filter(hour %in% 0:1 & day %in% 1:4) %>% # <~~~ smaller example: select hour 0-1, day 1-4
select(-day)

# calculate mean occurrence per month and hour
d2 <- d1 %>%
group_by(month, hour) %>%
summarise(mean_occ = round(mean(occurence, na.rm = TRUE), 1))
d2
# month hour mean_occ
# 1 1 0 5.0
# 2 1 1 8.0
# 3 2 0 5.5
# 4 2 1 6.5


# replace missing occurrence with mean_occ
d3 <- d1 %>%
left_join(d2, by = c("hour", "month")) %>%
mutate(occurence2 = ifelse(is.na(occurence), mean_occ, occurence)) %>%
select(-month, -mean_occ)

d3
# hour time occurence occurence2
# 1 0 2000-01-01 00:00:00 3 3.0
# 2 1 2000-01-01 01:00:00 8 8.0
# 3 0 2000-01-02 00:00:00 7 7.0
# 4 1 2000-01-02 01:00:00 8 8.0
# 5 0 2000-01-03 00:00:00 NA 5.0
# 6 1 2000-01-03 01:00:00 NA 8.0
# 7 0 2000-01-04 00:00:00 NA 5.0
# 8 1 2000-01-04 01:00:00 NA 8.0
# 9 0 2000-02-01 00:00:00 4 4.0
# 10 1 2000-02-01 01:00:00 6 6.0
# 11 0 2000-02-02 00:00:00 7 7.0
# 12 1 2000-02-02 01:00:00 7 7.0
# 13 0 2000-02-03 00:00:00 NA 5.5
# 14 1 2000-02-03 01:00:00 NA 6.5
# 15 0 2000-02-04 00:00:00 NA 5.5
# 16 1 2000-02-04 01:00:00 NA 6.5

关于r - 如何按小时计算一段时间内的平均值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27404700/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com