gpt4 book ai didi

python - Pandas:多个滚动周期

转载 作者:行者123 更新时间:2023-12-01 02:36:40 25 4
gpt4 key购买 nike

我想同时获取多个列的多个滚动周期平均值和标准差。

这是我用于滚动的代码(5):

def add_mean_std_cols(df):
res = df.rolling(5).agg(['mean','std'])

res.columns = res.columns.map('_'.join)

cols = np.concatenate(list(zip(df.columns, res.columns[0::2], res.columns[1::2])))

final = res.join(df).loc[:, cols]
return final

我想在同一操作中滚动 (5)、(15)、(30)、(45) 个周期。

我考虑过迭代周期,但不知道如何避免获得滚动平均值/标准差的滚动平均值/标准差...

最佳答案

我建议创建一个以 MultiIndex 作为其列的 DataFrame。这里无法使用循环来迭代窗口。生成的表单将易于索引,并且易于使用 pd.read_csv 阅读。使用适当形状的 np.empty 初始化一个空 DataFrame,并使用 .loc 为其分配值。

import numpy as np
import pandas as pd
np.random.seed(123)

df = pd.DataFrame(np.random.randn(100,3)).add_prefix('col')

windows = [5, 15, 30, 45]
stats = ['mean', 'std']
cols = pd.MultiIndex.from_product([windows, df.columns, stats],
names=['window', 'feature', 'metric'])

df2 = pd.DataFrame(np.empty((df.shape[0], len(cols))), columns=cols,
index=df.index)

for window in windows:
df2.loc[:, window] = df.rolling(window=window).agg(stats).values

现在您得到了一个与原始对象具有相同索引的结果df2。它有 3 个列级别:第一个是窗口,第二个是原始帧中的列,第三个是统计数据。

print(df2.shape)
(100, 24)

这使得检查特定滚动窗口的值变得容易:

print(df2[5])  # Rolling window = 5
feature col0 col1 col2
metric mean std mean std mean std
0 NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN
4 -0.87879 1.45348 -0.26559 0.71236 0.53233 0.89430
.. ... ... ... ... ... ...
95 -0.44231 1.02552 -1.22138 0.45140 -0.36440 0.95324
96 -0.58638 1.10246 -0.90165 0.79723 -0.44543 1.00166
97 -0.70564 0.85711 -0.42644 1.07174 -0.44766 1.00284
98 -0.95702 1.01302 -0.03705 1.05066 0.16437 1.32341
99 -0.57026 1.10978 0.08730 1.02438 0.39930 1.31240

print(df2[5]['col0']) # Rolling window = 5, stats of col0 only
metric mean std
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 -0.87879 1.45348
.. ... ...
95 -0.44231 1.02552
96 -0.58638 1.10246
97 -0.70564 0.85711
98 -0.95702 1.01302
99 -0.57026 1.10978

print(df2.loc[:, (5, slice(None), 'mean')]) # Rolling window = 5,
# means of each column
period 5
feature col0 col1 col2
metric mean mean mean
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 -0.87879 -0.26559 0.53233
.. ... ... ...
95 -0.44231 -1.22138 -0.36440
96 -0.58638 -0.90165 -0.44543
97 -0.70564 -0.42644 -0.44766
98 -0.95702 -0.03705 0.16437
99 -0.57026 0.08730 0.39930

最后,为了制作单索引 DataFrame,这里有一些 itertools 的错误使用。

df = pd.DataFrame(np.random.randn(100,3)).add_prefix('col')

import itertools

means = [col + '_mean' for col in df.columns]
stds = [col + '_std' for col in df.columns]
iters = [iter(means), iter(stds)]
iters = list(it.__next__() for it in itertools.cycle(iters))
iters = list(itertools.product(iters, [str(win) for win in windows]))
iters = ['_'.join(it) for it in iters]

df2 = [df.rolling(window=window).agg(stats).values for window in windows]
df2 = pd.DataFrame(np.concatenate(df2, axis=1), columns=iters,
index=df.index)

关于python - Pandas:多个滚动周期,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46144352/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com