gpt4 book ai didi

r - 使用 dplyr 基于动态窗口计算统计信息

转载 作者:行者123 更新时间:2023-12-04 10:14:51 25 4
gpt4 key购买 nike

我试图在 R 中使用 dplyr 根据基于日期和特定模型的动态窗口计算滚动统计数据(平均值、标准差等)。例如,在项目分组中,我想计算 10 天前所有数据的滚动平均值。数据上的日期不连续且不完整,因此我无法使用固定窗口。

一种方法是使用 rollapply 引用窗口宽度,如下所示。但是,我无法计算动态宽度。我更喜欢一种省略计算窗口的中间步骤并简单地根据 date_lookback 计算的方法。这是一个玩具示例。

我已经使用 for 循环来做到这一点,但它们非常慢。

    library(dplyr)
library(zoo)

date_lookback <- 10 #days to look back for rolling calcs

df <- data.frame(label = c(rep("a",5),rep("b",5)),
date = as.Date(c("2017-01-02","2017-01-20",
"2017-01-21","2017-01-30","2017-01-31","2017-01-05",
"2017-01-08","2017-01-09","2017-01-10","2017-01-11")),
data = c(790,493,718,483,825,186,599,408,108,666),stringsAsFactors = FALSE) %>%
mutate(.,
cut_date = date - date_lookback, #calcs based on sample since this date
dyn_win = c(1,1,2,3,3,1,2,3,4,5), ##!! need to calculate this vector??
roll_mean = rollapply(data, align = "right", width = dyn_win, mean),
roll_sd = rollapply(data, align = "right", width = dyn_win, sd))

这些是我正在寻找的 roll_mean 和 roll_sd 结果:
> df
label date data cut_date dyn_win roll_mean roll_sd
1 a 2017-01-02 790 2016-12-23 1 790.0000 NA
2 a 2017-01-20 493 2017-01-10 1 493.0000 NA
3 a 2017-01-21 718 2017-01-11 2 605.5000 159.0990
4 a 2017-01-30 483 2017-01-20 3 564.6667 132.8847
5 a 2017-01-31 825 2017-01-21 3 675.3333 174.9467
6 b 2017-01-05 186 2016-12-26 1 186.0000 NA
7 b 2017-01-08 599 2016-12-29 2 392.5000 292.0351
8 b 2017-01-09 408 2016-12-30 3 397.6667 206.6938
9 b 2017-01-10 108 2016-12-31 4 325.2500 222.3921
10 b 2017-01-11 666 2017-01-01 5 393.4000 245.5928

提前致谢。

最佳答案

您可以尝试在 dplyr 调用中显式引用您的数据集:

date_lookback <- 10 #days to look back for rolling calcs

df <- data.frame(label = c(rep("a",5),rep("b",5)),
date = as.Date(c("2017-01-02","2017-01-20",
"2017-01-21","2017-01-30","2017-01-31","2017-01-05",
"2017-01-08","2017-01-09","2017-01-10","2017-01-11")),
data = c(790,493,718,483,825,186,599,408,108,666),stringsAsFactors = FALSE)

df %>%
group_by(date,label) %>%
mutate(.,
roll_mean = mean(ifelse(df$date >= date-date_lookback & df$date <= date & df$label == label,
df$data,NA),na.rm=TRUE),
roll_sd = sd(ifelse(df$date >= date-date_lookback & df$date <= date & df$label == label,
df$data,NA),na.rm=TRUE))

关于r - 使用 dplyr 基于动态窗口计算统计信息,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42960646/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com