gpt4 book ai didi

python - 高效的时间序列滑动窗口函数

转载 作者:行者123 更新时间:2023-12-05 05:46:51 24 4
gpt4 key购买 nike

我正在尝试为时间序列创建一个滑动窗口。到目前为止,我有一个我设法开始工作的功能,它允许您获取给定的系列,以秒为单位设置窗口大小,然后创建滚动样本。我的问题是它需要很长时间才能运行,而且似乎是一种低效的方法。

# ========== create dataset  =========================== #

import pandas as pd
from datetime import timedelta, datetime


timestamp_list = ["2022-02-07 11:38:08.625",
"2022-02-07 11:38:09.676",
"2022-02-07 11:38:10.084",
"2022-02-07 11:38:10.10000",
"2022-02-07 11:38:11.2320"]

bid_price_list = [1.14338,
1.14341,
1.14340,
1.1434334,
1.1534334]

df = pd.DataFrame.from_dict(zip(timestamp_list, bid_price_list))
df.columns = ['timestamp','value']

# make date time object
df.timestamp = [datetime.strptime(time_i, "%Y-%m-%d %H:%M:%S.%f") for time_i in df.timestamp]
df.head(3)
timestamp value timestamp_to_sec
0 2022-02-07 11:38:08.625 1.14338 2022-02-07 11:38:08
1 2022-02-07 11:38:09.676 1.14341 2022-02-07 11:38:09
2 2022-02-07 11:38:10.084 1.14340 2022-02-07 11:38:10
# ========== create rolling time-series function  ====== #


# get the floor of time (second value)
df["timestamp_to_sec"] = df["timestamp"].dt.floor('s')

# set rollling window length in seconds
window_dt = pd.Timedelta(seconds=2)

# containers for rolling sample statistics
n_list = []
mean_list = []
std_list =[]

# add dt (window) seconds to the original time which was floored to the second
df["timestamp_to_sec_dt"] = df["timestamp_to_sec"] + window_dt

# get unique end times
time_unique_endlist = np.unique(df.timestamp_to_sec_dt)

# remove end times that are greater than the last actual time, i.e. max(df["timestamp_to_sec"])
time_unique_endlist = time_unique_endlist[time_unique_endlist <= max(df["timestamp_to_sec"])]

# loop running the sliding window (time_i is the end time of each window)
for time_i in time_unique_endlist:

# start time of each rolling window
start_time = time_i - window_dt

# sample for each time period of sliding window
rolling_sample = df[(df.timestamp >= start_time) & (df.timestamp <= time_i)]


# calculate the sample statistics
n_list.append(len(rolling_sample)) # store n observation count
mean_list.append(rolling_sample.mean()) # store rolling sample mean
std_list.append(rolling_sample.std()) # store rolling sample standard deviation

# plot histogram for each sample of the rolling sample
#plt.hist(rolling_sample.value, bins=10)
# tested and n_list brought back the correct values
>>> n_list
[2,3]

有没有一种更有效的方法,一种可以改进我的解释的方法,或者有一个开源包可以让我像这样运行一个滚动窗口?我知道 pandas 中有 .rolling() 但它会滚动值。我想要一些我可以在不均匀间隔的数据上使用的东西,使用时间来定义固定的滚动窗口。

最佳答案

这似乎是最好的表现,希望它能帮助其他人。

# set rollling window length in seconds
window_dt = pd.Timedelta(seconds=2)

# add dt seconds to the original timestep
df["timestamp_to_sec_dt"] = df["timestamp_to_sec"] + window_dt

# unique end time
time_unique_endlist = np.unique(df.timestamp_to_sec_dt)

# remove end values that are greater than the last actual value, i.e. max(df["timestamp_to_sec"])
time_unique_endlist = time_unique_endlist[time_unique_endlist <= max(df["timestamp_to_sec"])]

# containers for rolling sample statistics
mydic = {}
counter = 0

# loop running the rolling window
for time_i in time_unique_endlist:

start_time = time_i - window_dt

# sample for each time period of sliding window
rolling_sample = df[(df.timestamp >= start_time) & (df.timestamp <= time_i)]

# calculate the sample statistics
mydic[counter] = {
"sample_size":len(rolling_sample),
"sample_mean":rolling_sample["value"].mean(),
"sample_std":rolling_sample["value"].std()
}
counter = counter + 1

# results in a DataFrame
results = pd.DataFrame.from_dict(mydic).T

关于python - 高效的时间序列滑动窗口函数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71112144/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com