gpt4 book ai didi

r - 窗口变化的移动平均线

转载 作者:行者123 更新时间:2023-12-01 22:50:59 25 4
gpt4 key购买 nike

我想用变化的窗口大小计算 R 中变量的移动平均值。更具体地说:移动平均值应该计算三年,但数据(时间序列)的频率更高,并且每个三年窗口的窗口大小可能不同。

假设以下数据集:

library(data.table)
set.seed(1) # reproduceable data
dataset <- data.table(ID=c(rep("A",2208),rep("B",2208)),
x = c(rnorm(2208*2)), time=c(seq(as.Date("1988/03/15"),
as.Date("2000/04/16"), "day"),seq(as.Date("1988/03/15"),
as.Date("2000/04/16"), "day")))

应该为两个 ID(A 和 B)计算变量 x 的三年移动平均值。这可以最好地用 zoodatatable 来完成吗?但任何解决方案都可以。

请注意,我知道如何使用固定的窗口大小执行此操作,这里的问题是窗口大小不同。

最佳答案

如果我理解正确的话,OP 希望正好跨越 3 年。由于这可能包括闰年,因此窗口大小可以是 1095 天或 1096 天。

这可以通过在非等连接中聚合以及 lubridate回滚日期算法来解决。

library(data.table)
library(lubridate)
# create 3 years windows for each ID for later non-equi join
win <- dataset[, CJ(ID = ID, start = time, unique = TRUE)][
# make sure to pick
, end := start %m+% years(3) - days(1)][
# remove windows which end out of date range
end <= max(start)]
win
      ID      start        end
1: A 1988-03-15 1991-03-14
2: A 1988-03-16 1991-03-15
3: A 1988-03-17 1991-03-16
4: A 1988-03-18 1991-03-17
5: A 1988-03-19 1991-03-18
---
6638: B 1997-04-13 2000-04-12
6639: B 1997-04-14 2000-04-13
6640: B 1997-04-15 2000-04-14
6641: B 1997-04-16 2000-04-15
6642: B 1997-04-17 2000-04-16
# check window lengths
win[, .N, by = .(days = end - start + 1L)]
        days    N
1: 1095 days 2166
2: 1096 days 4476
# see what happens in leap years
win[leap_year(start) & month(start) == 2 & day(start) %in% 28:29,
.(start, end, days = end - start + 1L)]
        start        end      days
1: 1992-02-28 1995-02-27 1096 days
2: 1992-02-29 1995-02-27 1095 days
3: 1996-02-28 1999-02-27 1096 days
4: 1996-02-29 1999-02-27 1095 days
5: 1992-02-28 1995-02-27 1096 days
6: 1992-02-29 1995-02-27 1095 days
7: 1996-02-28 1999-02-27 1096 days
8: 1996-02-29 1999-02-27 1095 days
win[leap_year(end) & month(end) == 2 & day(end) %in% 28:29,
.(start, end, days = end - start + 1L)]
        start        end      days
1: 1989-03-01 1992-02-29 1096 days
2: 1993-03-01 1996-02-29 1096 days
3: 1997-03-01 2000-02-29 1096 days
4: 1989-03-01 1992-02-29 1096 days
5: 1993-03-01 1996-02-29 1096 days
6: 1997-03-01 2000-02-29 1096 days
# aggregate in a non-equi-join
dataset[win, on = .(ID, time >= start, time <= end), by = .EACHI, .(avg = mean(x))]
      ID       time       time         avg
1: A 1988-03-15 1991-03-14 -0.01184078
2: A 1988-03-16 1991-03-15 -0.01317813
3: A 1988-03-17 1991-03-16 -0.01179571
4: A 1988-03-18 1991-03-17 -0.01006100
5: A 1988-03-19 1991-03-18 -0.01221798
---
6638: B 1997-04-13 2000-04-12 -0.03412214
6639: B 1997-04-14 2000-04-13 -0.03604176
6640: B 1997-04-15 2000-04-14 -0.03556291
6641: B 1997-04-16 2000-04-15 -0.03392185
6642: B 1997-04-17 2000-04-16 -0.03393674

关于r - 窗口变化的移动平均线,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51750438/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com