gpt4 book ai didi

r - 需要具有开始停止索引的更快的滚动应用功能

转载 作者:行者123 更新时间:2023-12-04 09:43:26 24 4
gpt4 key购买 nike

下面是一段代码。它给出了滚动 15 分钟(历史)窗口的交易价格水平的百分位数。如果长度为 500 或 1000,它运行得很快,但正如您所看到的,有 45K 个观测值,并且对于整个数据,它运行得非常慢。我可以应用任何 plyr 函数吗?欢迎任何其他建议。

这是贸易数据的样子:

> str(trade)
'data.frame': 45571 obs. of 5 variables:
$ time : chr "2013-10-20 22:00:00.489" "2013-10-20 22:00:00.807" "2013-10-20 22:00:00.811" "2013-10-20 22:00:00.811" ...
$ prc : num 121 121 121 121 121 ...
$ siz : int 1 4 1 2 3 3 2 2 3 4 ...
$ aggress : chr "B" "B" "B" "B" ...
$ time.pos: POSIXlt, format: "2013-10-20 22:00:00.489" "2013-10-20 22:00:00.807" "2013-10-20 22:00:00.811" "2013-10-20 22:00:00.811" ...

这就是新列 trade$time.pos 之后数据的样子

trade$time.pos <- strptime(trade$time, format="%Y-%m-%d %H:%M:%OS") 

> head(trade)
time prc siz aggress time.pos
1 2013-10-20 22:00:00.489 121.3672 1 B 2013-10-20 22:00:00.489
2 2013-10-20 22:00:00.807 121.3750 4 B 2013-10-20 22:00:00.807
3 2013-10-20 22:00:00.811 121.3750 1 B 2013-10-20 22:00:00.811
4 2013-10-20 22:00:00.811 121.3750 2 B 2013-10-20 22:00:00.811
5 2013-10-20 22:00:00.811 121.3750 3 B 2013-10-20 22:00:00.811
6 2013-10-20 22:00:00.811 121.3750 3 B 2013-10-20 22:00:00.811

#t_15_index function returns the indices of the trades that were executed in last 15 minutes from the current trade(t-15 to t).
t_15_index <- function(data_vector,index) {
which(data_vector[index] - data_vector[1:index]<=15*60)
}

get_percentile <- function(data) {
len_d <- dim(trade)[1]

price_percentile = vector(length=len_d)

for(i in 1: len_d) {

t_15 = t_15_index(trade$time.pos,i)
#ecdf(rep(..)) gets the empirical distribution of the the trade size on a particular trade-price level
price_dist = ecdf(rep(trade$prc[t_15],trade$siz[t_15]))
#percentile of the current price level depending on current (t-15 to t) subset of data
price_percentile[i] = price_dist(trade$prc[i])
}
trade$price_percentile = price_percentile
trade
}


res_trade = get_percentile(trade)

最佳答案

可能有一种方法可以加速滚动应用程序,但由于窗口大小不断变化,我认为标准工具(例如 rollapply)不起作用,尽管有些人可能更熟悉它们会有想法。同时,您可以优化百分位计算。您可以直接计算一个合适的近似值,而不是使用创建具有所有相关开销的函数的 ecdf:

> vec <- rnorm(10000, 0, 3)
> val <- 5
> max(which(sort(vec) < val)) / length(vec)
[1] 0.9543
> ecdf(vec)(val)
[1] 0.9543
> microbenchmark(max(which(sort(vec) < val)) / length(vec))
Unit: milliseconds
expr min lq median uq max neval
max(which(sort(vec) < val))/length(vec) 1.093434 1.105231 1.116364 1.141204 1.449141 100
> microbenchmark(ecdf(vec)(val))
Unit: milliseconds
expr min lq median uq max neval
ecdf(vec)(val) 2.552946 2.808041 3.043579 3.439269 4.208202 100

大约提高了 2.5 倍。对于较小的样本,改进更大。

关于r - 需要具有开始停止索引的更快的滚动应用功能,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21062927/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com