gpt4 book ai didi

python - Python 中反对角线的滚动平均值

转载 作者:太空宇宙 更新时间:2023-11-03 13:56:33 25 4
gpt4 key购买 nike

我有一个之前移动过的 pandas 数据透视表,现在看起来像这样:

pivot
A B C D E
0 5.3 5.1 3.5 4.2 4.5
1 5.3 4.1 3.5 4.2 NaN
2 4.3 4.1 3.5 NaN NaN
3 4.3 4.1 NaN NaN NaN
4 4.3 NaN NaN NaN NaN

我正在尝试在对每一列进行迭代的反对角线上使用可变窗口(在本例中为 3 和 4 个周期)计算滚动平均值,并尝试将该值存储在新的数据框中,如下所示:

expected_df with a 3 periods window
A B C D E
0 4.3 4.1 3.5 4.2 4.5

expected_df with a 4 periods window
A B C D E
0 4.5 4.3 3.5 4.2 4.5

到目前为止,我尝试对原始数据透视表进行子集化并创建一个不同的数据框,该数据框仅包含每列的指定窗口值,然后计算平均值,如下所示:

subset
A B C D E
0 4.3 4.1 3.5 4.2 4.5
1 4.3 4.1 3.5 4.2 NaN
2 4.3 4.1 3.5 NaN NaN

为此,我尝试构建以下 for 循环:

df2 = pd.DataFrame()
size = pivot.shape[0]
window = 3

for i in range(size):
df2[i] = pivot.iloc[size-window-i:size-i,i]

即使这个 pivot.iloc[size-window-i:size-i,i] 在我手动传入索引时确实返回了我需要的值,但在for 循环,它错过了第二列的第一个值,依此类推:

df2
A B C D E
0 4.3 NaN NaN NaN NaN
1 4.3 4.1 NaN NaN NaN
2 4.3 4.1 3.5 NaN NaN

有没有人知道如何计算移动平均值或如何修复 for 循环部分?预先感谢您的意见。

最佳答案

IIUC:

所有内容移回

shifted = pd.concat([df.iloc[:, i].shift(i) for i in range(df.shape[1])], axis=1)
shifted

A B C D E
0 5.3 NaN NaN NaN NaN
1 5.3 5.1 NaN NaN NaN
2 4.3 4.1 3.5 NaN NaN
3 4.3 4.1 3.5 4.2 NaN
4 4.3 4.1 3.5 4.2 4.5

然后你可以得到你的意思。

# Change this 🡇 to get the last n number of rows
shifted.iloc[-3:].mean()

A 4.3
B 4.1
C 3.5
D 4.2
E 4.5
dtype: float64

或滚动平均值

#   Change this 🡇 to get the last n number of rows
shifted.rolling(3, min_periods=1).mean()

A B C D E
0 5.300000 NaN NaN NaN NaN
1 5.300000 5.100000 NaN NaN NaN
2 4.966667 4.600000 3.5 NaN NaN
3 4.633333 4.433333 3.5 4.2 NaN
4 4.300000 4.100000 3.5 4.2 4.5

Numpy 步幅

我将使用 strides 构建 3-D 数组并在其中一个轴上取平均值。这更快但令人困惑......

此外,我不会使用它。我只是想确定如何通过步幅捕获对角线元素。这对我来说是更多的练习,我想分享。

from numpy.lib.stride_tricks import as_strided as strided

a = df.values

roll = 3
r_ = roll - 1 # one less than roll

h, w = a.shape
w_ = w - 1 # one less than width

b = np.empty((h + 2 * w_ + r_, w), dtype=a.dtype)
b.fill(np.nan)
b[w_ + r_:-w_] = a

s0, s1 = b.strides
a_ = np.nanmean(strided(b, (h + w_, roll, w), (s0, s0, s1 - s0))[w_:], axis=1)

pd.DataFrame(a_, df.index, df.columns)

A B C D E
0 5.300000 NaN NaN NaN NaN
1 5.300000 5.100000 NaN NaN NaN
2 4.966667 4.600000 3.5 NaN NaN
3 4.633333 4.433333 3.5 4.2 NaN
4 4.300000 4.100000 3.5 4.2 4.5

数巴

我对此感觉比使用步幅更好

import numpy as np
from numba import njit
import warnings

@njit
def dshift(a, roll):
h, w = a.shape
b = np.empty((h, roll, w), dtype=np.float64)
b.fill(np.nan)

for r in range(roll):
for i in range(h):
for j in range(w):
k = i - j - r
if k >= 0:
b[i, r, j] = a[k, j]

return b

with warnings.catch_warnings():
warnings.simplefilter('ignore', category=RuntimeWarning)

df_ = pd.DataFrame(np.nanmean(dshift(a, 3), axis=1, ), df.index, df.columns)

df_

A B C D E
0 5.300000 NaN NaN NaN NaN
1 5.300000 5.100000 NaN NaN NaN
2 4.966667 4.600000 3.5 NaN NaN
3 4.633333 4.433333 3.5 4.2 NaN
4 4.300000 4.100000 3.5 4.2 4.5

关于python - Python 中反对角线的滚动平均值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54870279/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com