gpt4 book ai didi

java - 历史贸易数据的时间序列重采样

转载 作者:行者123 更新时间:2023-11-28 07:11:24 25 4
gpt4 key购买 nike

我在 csv 文件中有一些历史交易日期,格式为:unixtime、价格、交易量我想分析这些数据。

我设法用 Python 做到了,但速度慢得令人痛苦(我花了大约 2 天的时间来运行算法进行 30 天的数据测试)。

我正在尝试用 c/c++ 甚至 Java 或 Scala 来实现,但我的主要问题是我无法对数据重新采样。我需要将这些数据重新采样为以下格式:日期时间、开盘价、高价、低价、收盘价、成交量,间隔 15 分钟,但我在 c/c++ 中找不到任何方法来做到这一点

在 Python 中,这是我想要的(它使用 pandas Dataframe):

def resample_data(raw_data, time_frame):
# resamples the ticker data in ohlc
resampledData = raw_data.copy()
ohlc_dict = {
'open':'first',
'high':'max',
'low':'min',
'close':'last',
'price':'first'
}

resampledData = resampledData.resample(time_frame, how={'price':ohlc_dict, 'amount':'sum'})
resampledData.amount = resampledData['amount']['sum'].fillna(0.0)
resampledData['price']['close'] = resampledData['price']['close'].fillna(method='pad')
resampledData = resampledData.apply(lambda x: x.fillna(resampledData['price']['close']))

return resampledData

在 c/c++/Java/scala 中执行此操作的任何想法(或库)?

最佳答案

这只是一个简单的示例,说明您可以使用标准 Scala 库执行的操作。此代码可以在 Scala REPL 中运行:

// not importing external libraries like Joda time and its Scala wrappers
import java.util.Date
import scala.annotation.tailrec

case class Sample(value: Double, timeMillis: Long)
case class SampleAggregate(startTimeMillis: Long, endTimeMillis: Long,
min: Sample, max: Sample)

val currentMillis = System.currentTimeMillis
val inSec15min = 15 * 60
val inMillis15min = inSec15min * 1000
// sample each second:
val data = (1 to inSec15min * 100).map { i =>
Sample(i, currentMillis + i*1000) }.toList

@tailrec
def aggregate(xs: List[Sample], intervalDurationMillis: Long,
accu: List[SampleAggregate]): List[SampleAggregate] =
xs match {
case h :: t =>
val start = h.timeMillis
val (slice, rest) = xs.span(_.timeMillis < (start + intervalDurationMillis))
val end = slice.last.timeMillis
val aggr = SampleAggregate(start, end, slice.minBy(_.value),
slice.maxBy(_.value))
aggregate(rest, intervalDurationMillis, aggr :: accu)
case Nil =>
accu.reverse
}

val result = aggregate(data, inMillis15min, Nil)

虚假数据:

data.take(10).foreach(println)
Sample(1.0,1388809630677)
Sample(2.0,1388809631677)
Sample(3.0,1388809632677)
Sample(4.0,1388809633677)
Sample(5.0,1388809634677)
Sample(6.0,1388809635677)
Sample(7.0,1388809636677)
Sample(8.0,1388809637677)
Sample(9.0,1388809638677)
Sample(10.0,1388809639677)

结果:

result.foreach(println)
SampleAggregate(1388809630677,1388810529677,Sample(1.0,1388809630677),Sample(900.0,1388810529677))
SampleAggregate(1388810530677,1388811429677,Sample(901.0,1388810530677),Sample(1800.0,1388811429677))
SampleAggregate(1388811430677,1388812329677,Sample(1801.0,1388811430677),Sample(2700.0,1388812329677))
SampleAggregate(1388812330677,1388813229677,Sample(2701.0,1388812330677),Sample(3600.0,1388813229677))
SampleAggregate(1388813230677,1388814129677,Sample(3601.0,1388813230677),Sample(4500.0,1388814129677))
SampleAggregate(1388814130677,1388815029677,Sample(4501.0,1388814130677),Sample(5400.0,1388815029677))
SampleAggregate(1388815030677,1388815929677,Sample(5401.0,1388815030677),Sample(6300.0,1388815929677))

我们可以将一个函数传递给 span 来定义间隔(小时或天)。这也可以在从文件中读取时转换为 Stream。

关于java - 历史贸易数据的时间序列重采样,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20916071/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com