gpt4 book ai didi

r - 自适应移动平均线-R中的最佳性能

转载 作者:行者123 更新时间:2023-12-03 14:30:43 25 4
gpt4 key购买 nike

我正在寻找R中滚动/滑动窗口功能方面的一些性能提升。这是非常常见的任务,可以在任何有序观测数据集中使用。我想分享我的一些发现,也许有人可以提供反馈以使其更快。
重要说明是,我专注于案例align="right"和自适应滚动窗口,因此width是一个向量(与我们的观察向量长度相同)。如果我们将width作为标量,则zooTTR软件包中已经有非常完善的函数,将很难克服(4年后:它比我预期的要容易),因为其中一些函数是即使使用Fortran(但使用wapply下面提到的用户定义的FUN仍然可以更快)。
RcppRoll软件包由于其出色的性能而值得一提,但是到目前为止,还没有函数可以回答该问题。如果有人可以扩展它来回答问题,那就太好了。

考虑我们有以下数据:

x = c(120,105,118,140,142,141,135,152,154,138,125,132,131,120)
plot(x, type="l")




并且我们想在具有可变滚动窗口 xwidth向量上应用滚动功能。

set.seed(1)
width = sample(2:4,length(x),TRUE)


在这种特殊情况下,我们将具有适应 samplec(2,3,4)的滚动功能。
我们将应用 mean函数,预期结果:

r = f(x, width, FUN = mean)
print(r)
## [1] NA NA 114.3333 120.7500 141.0000 135.2500 139.5000
## [8] 142.6667 147.0000 146.0000 131.5000 128.5000 131.5000 127.6667
plot(x, type="l")
lines(r, col="red")




可以采用任何指标来产生 width自变量作为自适应移动平均值或其他任何函数的不同变体。

寻找顶级性能。

最佳答案

2018年12月更新

自适应滚动功能的有效实现已在
data.table最近-?froll手册中的更多信息。此外,已经确定了使用碱基R的有效替代解决方案(以下fastama)。不幸的是,凯文·乌谢(Kevin Ushey)的答案并未解决该问题,因此未包含在基准测试中。
基准的规模已经增加,因为比较微秒毫无意义。

set.seed(108)
x = rnorm(1e6)
width = rep(seq(from = 100, to = 500, by = 5), length.out=length(x))
microbenchmark(
zoo=rollapplyr(x, width = width, FUN=mean, fill=NA),
mapply=base_mapply(x, width=width, FUN=mean, na.rm=T),
wmapply=wmapply(x, width=width, FUN=mean, na.rm=T),
ama=ama(x, width, na.rm=T),
fastama=fastama(x, width),
frollmean=frollmean(x, width, na.rm=T, adaptive=TRUE),
frollmean_exact=frollmean(x, width, na.rm=T, adaptive=TRUE, algo="exact"),
times=1L
)
#Unit: milliseconds
# expr min lq mean median uq max neval
# zoo 32371.938248 32371.938248 32371.938248 32371.938248 32371.938248 32371.938248 1
# mapply 13351.726032 13351.726032 13351.726032 13351.726032 13351.726032 13351.726032 1
# wmapply 15114.774972 15114.774972 15114.774972 15114.774972 15114.774972 15114.774972 1
# ama 9780.239091 9780.239091 9780.239091 9780.239091 9780.239091 9780.239091 1
# fastama 351.618042 351.618042 351.618042 351.618042 351.618042 351.618042 1
# frollmean 7.708054 7.708054 7.708054 7.708054 7.708054 7.708054 1
# frollmean_exact 194.115012 194.115012 194.115012 194.115012 194.115012 194.115012 1


ama = function(x, n, na.rm=FALSE, fill=NA, nf.rm=FALSE) {
# more or less the same as previous forloopply
stopifnot((nx<-length(x))==length(n))
if (nf.rm) x[!is.finite(x)] = NA_real_
ans = rep(NA_real_, nx)
for (i in seq_along(x)) {
ans[i] = if (i >= n[i])
mean(x[(i-n[i]+1):i], na.rm=na.rm)
else as.double(fill)
}
ans
}
fastama = function(x, n, na.rm, fill=NA) {
if (!missing(na.rm)) stop("fast adaptive moving average implemented in R does not handle NAs, input having NAs will result in incorrect answer so not even try to compare to it")
# fast implementation of adaptive moving average in R, in case of NAs incorrect answer
stopifnot((nx<-length(x))==length(n))
cs = cumsum(x)
ans = rep(NA_real_, nx)
for (i in seq_along(cs)) {
ans[i] = if (i == n[i])
cs[i]/n[i]
else if (i > n[i])
(cs[i]-cs[i-n[i]])/n[i]
else as.double(fill)
}
ans
}




旧答案:

我选择了4种不需要C ++的解决方案,很容易找到或使用google。

# 1. rollapply
library(zoo)
?rollapplyr
# 2. mapply
base_mapply <- function(x, width, FUN, ...){
FUN <- match.fun(FUN)
f <- function(i, width, data){
if(i < width) return(NA_real_)
return(FUN(data[(i-(width-1)):i], ...))
}
mapply(FUN = f,
seq_along(x), width,
MoreArgs = list(data = x))
}
# 3. wmapply - modified version of wapply found: https://rmazing.wordpress.com/2013/04/23/wapply-a-faster-but-less-functional-rollapply-for-vector-setups/
wmapply <- function(x, width, FUN = NULL, ...){
FUN <- match.fun(FUN)
SEQ1 <- 1:length(x)
SEQ1[SEQ1 < width] <- NA_integer_
SEQ2 <- lapply(SEQ1, function(i) if(!is.na(i)) (i - (width[i]-1)):i)
OUT <- lapply(SEQ2, function(i) if(!is.null(i)) FUN(x[i], ...) else NA_real_)
return(base:::simplify2array(OUT, higher = TRUE))
}
# 4. forloopply - simple loop solution
forloopply <- function(x, width, FUN = NULL, ...){
FUN <- match.fun(FUN)
OUT <- numeric()
for(i in 1:length(x)) {
if(i < width[i]) next
OUT[i] <- FUN(x[(i-(width[i]-1)):i], ...)
}
return(OUT)
}


以下是 prod功能的时序。 mean函数可能已经在 rollapplyr内部进行了优化。所有结果均相等。

library(microbenchmark)
# 1a. length(x) = 1000, window = 5-20
x <- runif(1000,0.5,1.5)
width <- rep(seq(from = 5, to = 20, by = 5), length(x)/4)
microbenchmark(
rollapplyr(data = x, width = width, FUN = prod, fill = NA),
base_mapply(x = x, width = width, FUN = prod, na.rm=T),
wmapply(x = x, width = width, FUN = prod, na.rm=T),
forloopply(x = x, width = width, FUN = prod, na.rm=T),
times=100L
)
Unit: milliseconds
expr min lq median uq max neval
rollapplyr(data = x, width = width, FUN = prod, fill = NA) 59.690217 60.694364 61.979876 68.55698 153.60445 100
base_mapply(x = x, width = width, FUN = prod, na.rm = T) 14.372537 14.694266 14.953234 16.00777 99.82199 100
wmapply(x = x, width = width, FUN = prod, na.rm = T) 9.384938 9.755893 9.872079 10.09932 84.82886 100
forloopply(x = x, width = width, FUN = prod, na.rm = T) 14.730428 15.062188 15.305059 15.76560 342.44173 100

# 1b. length(x) = 1000, window = 50-200
x <- runif(1000,0.5,1.5)
width <- rep(seq(from = 50, to = 200, by = 50), length(x)/4)
microbenchmark(
rollapplyr(data = x, width = width, FUN = prod, fill = NA),
base_mapply(x = x, width = width, FUN = prod, na.rm=T),
wmapply(x = x, width = width, FUN = prod, na.rm=T),
forloopply(x = x, width = width, FUN = prod, na.rm=T),
times=100L
)
Unit: milliseconds
expr min lq median uq max neval
rollapplyr(data = x, width = width, FUN = prod, fill = NA) 71.99894 74.19434 75.44112 86.44893 281.6237 100
base_mapply(x = x, width = width, FUN = prod, na.rm = T) 15.67158 16.10320 16.39249 17.20346 103.6211 100
wmapply(x = x, width = width, FUN = prod, na.rm = T) 10.88882 11.54721 11.75229 12.19790 106.1170 100
forloopply(x = x, width = width, FUN = prod, na.rm = T) 15.70704 16.06983 16.40393 17.14210 108.5005 100

# 2a. length(x) = 10000, window = 5-20
x <- runif(10000,0.5,1.5)
width <- rep(seq(from = 5, to = 20, by = 5), length(x)/4)
microbenchmark(
rollapplyr(data = x, width = width, FUN = prod, fill = NA),
base_mapply(x = x, width = width, FUN = prod, na.rm=T),
wmapply(x = x, width = width, FUN = prod, na.rm=T),
forloopply(x = x, width = width, FUN = prod, na.rm=T),
times=100L
)
Unit: milliseconds
expr min lq median uq max neval
rollapplyr(data = x, width = width, FUN = prod, fill = NA) 753.87882 781.8789 809.7680 872.8405 1116.7021 100
base_mapply(x = x, width = width, FUN = prod, na.rm = T) 148.54919 159.9986 231.5387 239.9183 339.7270 100
wmapply(x = x, width = width, FUN = prod, na.rm = T) 98.42682 105.2641 117.4923 183.4472 245.4577 100
forloopply(x = x, width = width, FUN = prod, na.rm = T) 533.95641 602.0652 646.7420 672.7483 922.3317 100

# 2b. length(x) = 10000, window = 50-200
x <- runif(10000,0.5,1.5)
width <- rep(seq(from = 50, to = 200, by = 50), length(x)/4)
microbenchmark(
rollapplyr(data = x, width = width, FUN = prod, fill = NA),
base_mapply(x = x, width = width, FUN = prod, na.rm=T),
wmapply(x = x, width = width, FUN = prod, na.rm=T),
forloopply(x = x, width = width, FUN = prod, na.rm=T),
times=100L
)
Unit: milliseconds
expr min lq median uq max neval
rollapplyr(data = x, width = width, FUN = prod, fill = NA) 912.5829 946.2971 1024.7245 1071.5599 1431.5289 100
base_mapply(x = x, width = width, FUN = prod, na.rm = T) 171.3189 180.6014 260.8817 269.5672 344.4500 100
wmapply(x = x, width = width, FUN = prod, na.rm = T) 123.1964 131.1663 204.6064 221.1004 484.3636 100
forloopply(x = x, width = width, FUN = prod, na.rm = T) 561.2993 696.5583 800.9197 959.6298 1273.5350 100

关于r - 自适应移动平均线-R中的最佳性能,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21368245/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com