gpt4 book ai didi

r - 使用 bfast 检测季节性成分的变化

转载 作者:行者123 更新时间:2023-12-05 03:04:43 29 4
gpt4 key购买 nike

包 bfast 中的 bfast() 函数应该能够检测长期趋势的断点和季节性成分的变化。一个示例是这张图 ( source ):
enter image description here
在此图中,子图没有。图 2 显示检测到的季节性变化,而没有。图 3 显示了趋势中的断点。

但是,我不明白如何告诉 bfast() 寻找季节性变化/断点。我得到的只是长期趋势中的断点。这是一个可重现的示例,它模拟一个 50 年的时间序列,每周测量季节性变量 y(即每年 52 次测量):

n_years <- 50
freq <- 52
y_pattern <- sin(seq(0, 2*pi, length = freq))
y <- rep(y_pattern, n_years) + rnorm(freq*n_years, sd = 0.1)
mydata <- data.frame(Year = rep(1:n_years, each = freq), Week = rep(1:freq, n_years), y)

这些数据显示了数据中恒定的季节性趋势,年度峰值在第 13 周左右。现在,让我们介绍第 25 年的季节性变化,将第 26-59 年的季节性周期推迟 8 周:

move_data <- function(data, year, weeks_to_move){
x <- data[data$Year == year, "y"]
c(x[seq(52 - weeks_to_move + 1,52)], x[seq(1, 52 - weeks_to_move)])
}

mydata$y_shifted <- mydata$y
for (year in 26:50){
mydata$y_shifted[mydata$Year == year] <- move_data(mydata, year, weeks_to_move = 8)
}

变量 y_shifted 现在在 1-25 年的第 13 周左右和 26-52 年的第 21 周左右出现年度峰值。让我们绘制它,与“未移位”变量 y 进行比较:

mydata$Phase <- ifelse(mydata$Year <= 25, "Year 1-25", "Year 26-50")
mydata %>%
tidyr::gather("y_variable", "value", y, y_shifted) %>%
ggplot(aes(Week, value, group = Year, color = Phase)) + geom_line() +
facet_grid(.~y_variable)

[ Annual cycle of ]y and y_shifted[3]

这种季节性的突然转变应该很容易被发现。然而,当我运行 `bfast() 时,它没有检测到任何变化:

y_ts <- ts(mydata$y_shifted, start = c(1,1), frequency = freq)
fit <- bfast(y_ts, h=.15, season="harmonic", max.iter=20, breaks=3)
plot(fit)

enter image description here

如您所见,未检测到季节性变化(上面的子图 2)。残差(子图 4)反射(reflect)了季节性的变化,如果我们按一年中的某一天绘制残差,这一点就很明显了:

mydata$Residuals <- fit$output[[1]]$Nt
ggplot(mydata, aes(Week, Residuals, group = Year, color = Phase)) + geom_point()

Residuals vs day-of-the-year, marked by year 1-25 and 26-50

我觉得我需要更改一些参数或选项,以便让 bfast() 查找季节性变化,但是哪个?我无法从文档中挖掘出这些信息。

最佳答案

我在对我的消费者组合数据测试 bfast 时遇到了同样的问题,但未能找到任何真正的解决方案。我继续深入研究地球传感社区的 bfast 文献,这是 bfast 首次开发和广泛使用的地方。我的理解是,要使早餐始终符合有用的季节性成分,您几乎无能为力。

几天前,我在 the best software for time series analysis 上遇到了这个 Quora 讨论发现有一个新的 R 包 Rbeast用于断点检测和时间序列分解。还有一条很好的推文显示了快速比较 between bfast and Rbeast .

经过一些试验,我发现 Rbeast 能够在我和你的数据中找出季节性断点。坦率地说,我仍然不知道 Rbeast 是如何工作的。 Rbeast 中的 BEAST 算法看起来相当复杂,有大量的输出;它没有很好的文档记录,也不像 bfast 那样容易使用。让我展示一下我得到的结果,首先使用您的数据,然后使用第二个人工时间序列。

您的数据

# The original code to generate your data
n_years <- 50
freq <- 52
y_pattern <- sin(seq(0, 2*pi, length = freq))
y <- rep(y_pattern, n_years) + rnorm(freq*n_years, sd = 0.1)
mydata <- data.frame(Year = rep(1:n_years, each = freq), Week = rep(1:freq, n_years), y)

move_data <- function(data, year, weeks_to_move){
x <- data[data$Year == year, "y"]
c(x[seq(52 - weeks_to_move + 1,52)], x[seq(1, 52 - weeks_to_move)])
}

mydata$y_shifted <- mydata$y
for (year in 26:50){
mydata$y_shifted[mydata$Year == year] <- move_data(mydata, year, weeks_to_move = 8)
}

# You data analyzed by the BEAST algorithm in Rbeast
library(Rbeast)
fit <- beast(mydata$y_shifted, freq=52)
print(fit)
plot(fit)
#####################################################################
# Seasonal Changepoints #
#####################################################################
.-------------------------------------------------------------------.
| Ascii plot of probability distribution for number of chgpts (ncp) |
.-------------------------------------------------------------------.
|Pr(ncp = 0 )=0.000|* |
|Pr(ncp = 1 )=0.999|*********************************************** |
|Pr(ncp = 2 )=0.001|* |
|Pr(ncp = 3 )=0.000|* |
|Pr(ncp = 4 )=0.000|* |
|Pr(ncp = 5 )=0.000|* |
|Pr(ncp = 6 )=0.000|* |
|Pr(ncp = 7 )=0.000|* |
|Pr(ncp = 8 )=0.000|* |
|Pr(ncp = 9 )=0.000|* |
|Pr(ncp = 10)=0.000|* |
.-------------------------------------------------------------------.
| Summary for number of Seasonal ChangePoints (scp) |
.-------------------------------------------------------------------.
|ncp_max = 10 | MaxSeasonKnotNum: A parameter you set |
|ncp_mode = 1 | Pr(ncp= 1)=1.00: There is a 99.9% probability |
| | that the seasonal component has 1 chgnpt(s). |
|ncp_mean = 1.00 | Sum{ncp*Pr(ncp)} for ncp = 0,...,10 |
|ncp_pct10 = 1.00 | 10% percentile for number of changepoints |
|ncp_median = 1.00 | 50% percentile: Median number of changepoints |
|ncp_pct90 = 1.00 | 90% percentile for number of changepoints |
.-------------------------------------------------------------------.
| List of probable seasonal changepoints ranked by probability of |
| occurrence: Please combine the ncp reported above to determine |
| which changepoints below are practically meaningful |
'-------------------------------------------------------------------'
|scp# |time (cp) |prob(cpPr) |
|------------------|---------------------------|--------------------|
|1 |1301.000000 |1.00000 |
.-------------------------------------------------------------------.

enter image description here

精确地检测到突然的季节性变化。 Rbeast 还给出了检测季节性和趋势断点的概率(上图中 Pr(scp) 和 Pr(tcp) 面板中的红色和绿色曲线)。检测到季节性变化的概率非常高,接近 1.0。你的数据趋势是一条平线。它本质上是一个零常数,并且在趋势中找到断点(即 Rbeast 中使用的变化点)的概率也始终接近于零。

第二个时间序列

Rbeast 的一个很酷的功能是估计谐波季节性模型的 sin 和 cos 阶数。下面,我生成了一个时间序列,该时间序列具有三个季节性段(即两次中断)加上一个没有中断的倾斜趋势。三个季节段的sin顺序不同,分别取1、2、3。

# Generate a sample time series with three seasonal segments
# the sin/cos orders for the three segs are different.
seg1 <- 1:1000
seg2 <- 1001:2000
seg3 <- 2001:3000
new_data <- c( sin(seg1*2*pi/52), 0.6*sin( seg2*2*pi/52*2), 0.3*sin( seg3*2*pi/52*3)) + (1:3000)*0.0002+ rnorm(3000, sd = 0.1)
# Test bfast using new_data
y_ts <- ts(new_data, start = c(1,1), frequency = 52)
fit <- bfast(y_ts, h=.15, season="harmonic", max.iter=20, breaks=3)
plot(fit)

令人惊讶的是,bfast 没有检测到任何季节性中断,尽管在绘制的数据 Yt 中很容易注意到这三个部分。

# Analyze the new_data time series using `Rbeast`

fit <- beast(new_data, freq=52)
print(fit)
plot(fit)
#####################################################################
# Seasonal Changepoints #
#####################################################################
.-------------------------------------------------------------------.
| Ascii plot of probability distribution for number of chgpts (ncp) |
.-------------------------------------------------------------------.
|Pr(ncp = 0 )=0.000|* |
|Pr(ncp = 1 )=0.000|* |
|Pr(ncp = 2 )=0.969|*********************************************** |
|Pr(ncp = 3 )=0.031|** |
|Pr(ncp = 4 )=0.000|* |
|Pr(ncp = 5 )=0.000|* |
|Pr(ncp = 6 )=0.000|* |
|Pr(ncp = 7 )=0.000|* |
|Pr(ncp = 8 )=0.000|* |
|Pr(ncp = 9 )=0.000|* |
|Pr(ncp = 10)=0.000|* |
.-------------------------------------------------------------------.
| Summary for number of Seasonal ChangePoints (scp) |
.-------------------------------------------------------------------.
|ncp_max = 10 | MaxSeasonKnotNum: A parameter you set |
|ncp_mode = 2 | Pr(ncp= 2)=0.97: There is a 96.9% probability |
| | that the seasonal component has 2 chgnpt(s). |
|ncp_mean = 2.03 | Sum{ncp*Pr(ncp)} for ncp = 0,...,10 |
|ncp_pct10 = 2.00 | 10% percentile for number of changepoints |
|ncp_median = 2.00 | 50% percentile: Median number of changepoints |
|ncp_pct90 = 2.00 | 90% percentile for number of changepoints |
.-------------------------------------------------------------------.
| List of probable seasonal changepoints ranked by probability of |
| occurrence: Please combine the ncp reported above to determine |
| which changepoints below are practically meaningful |
'-------------------------------------------------------------------'
|scp# |time (cp) |prob(cpPr) |
|------------------|---------------------------|--------------------|
|1 |2001.000000 |1.00000 |
|2 |1001.000000 |1.00000 |
|3 |1027.000000 |0.02942 |
.-------------------------------------------------------------------.

enter image description here

以上是Rbeast的结果。恢复了两个休息时间和三个季节性片段。 Rbeast 估计的季节性谐波阶数趋势没有中断。在上面的Order_s面板中,正确恢复了三个sin和cos阶数。 Order_s 曲线还显示了两个季节性中断的位置。

关于r - 使用 bfast 检测季节性成分的变化,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52708697/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com