gpt4 book ai didi

python - 特定累积变化后的变化值

转载 作者:行者123 更新时间:2023-12-04 01:23:55 25 4
gpt4 key购买 nike

我有以下数据:

data = [0.1, 0.2, 0.3, 0.4 , 0.5, 0.6, 0.7, 0.8, 0.5, 0.2, 0.1, -0.1,
-0.2, -0.3, -0.4, -0.5, -0.6, -0.7, -0.9, -1.2, -0.1, -0.7]

每次数据点变化超过步长,我都想记录下来。如果不我想保留旧的,直到累积变化至少与步长一样多。我像这样迭代地实现这一点:

import pandas as pd
from copy import deepcopy
import numpy as np

step = 0.5
df_steps = pd.Series(data)
df = df_steps.copy()

today = None
yesterday = None
for index, value in df_steps.iteritems():
today = deepcopy(index)
if today is not None and yesterday is not None:
if abs(df.loc[today] - df_steps.loc[yesterday]) > step:
df_steps.loc[today] = df.loc[today]
else:
df_steps.loc[today] = df_steps.loc[yesterday]

yesterday = deepcopy(today)

我的最终结果是:

[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.7, 0.7, 0.7, 0.7, 0.1, 0.1, 0.1, 0.1, 0.1, -0.5, -0.5, -0.5, -0.5, -1.2, -0.1, -0.7]

问题与问题

问题是这是迭代实现的(我同意第二个答案 here )。我的问题是如何以矢量化方式实现同​​样的目标?

尝试

我的尝试如下,但与结果不符:

(df.diff().cumsum().replace(np.nan, 0) / step).astype(int)

最佳答案

由于纯向量化的方法看起来并不简单,我们可以使用 numba 将代码编译到 C 级,因此有一个循环但非常符合共振峰方法。这是使用 numba 的 nopython 模式的一种方法:

from numba import njit, float64

@njit('float64[:](float64[:], float32)')
def set_at_cum_change(a, step):
out = np.empty(len(a), dtype=float64)
prev = a[0]
out[0] = a[0]
for i in range(1,len(a)):
current = a[i]
if np.abs(current-prev) > step:
out[i] = current
prev = current
else:
out[i] = out[i-1]
return out

在同一阵列上进行的测试给出:

data = np.array([0.1, 0.2, 0.3, 0.4 , 0.5, 0.6, 0.7, 0.8, 0.5, 0.2, 0.1, -0.1,
-0.2, -0.3, -0.4, -0.5, -0.6, -0.7, -0.9, -1.2, -0.1, -0.7])

out = set_at_cum_change(data,step= 0.5)

print(out)
array([ 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.7, 0.7, 0.7, 0.7, 0.1,
0.1, 0.1, 0.1, 0.1, -0.5, -0.5, -0.5, -0.5, -1.2, -0.1, -0.7])

如果我们检查时间,我们会看到 巨大 110000x 加速 numba 方法在 22000长度数组。这不仅表明 numba 在这些情况下是一种很好的方法,而且还清楚地表明使用 panda's iterrows/iteritems is almost always a bad idea :

def op(data):
step = 0.5
df_steps = pd.Series(data)
df = df_steps.copy()

today = None
yesterday = None
for index, value in df_steps.iteritems():
today = deepcopy(index)
if today is not None and yesterday is not None:
if abs(df.loc[today] - df_steps.loc[yesterday]) > step:
df_steps.loc[today] = df.loc[today]
else:
df_steps.loc[today] = df_steps.loc[yesterday]

yesterday = deepcopy(today)
return df_steps.to_numpy()

def fn(step):
current = float('inf')
i = yield

while True:
if abs(current - i) > step:
current = i
i = yield i
else:
i = yield current

def andrej(data):
df = pd.DataFrame({'data': data})
f = fn(0.5)
next(f)
df['new_data'] = df['data'].apply(lambda x: f.send(x))

data_large = np.tile(data, 1_000)
print(data_large.shape)
# (22000,)

np.allclose(op(data_large), set_at_cum_change(data_large, step=0.5))
# True

%timeit op(data_large)
# 5.78 s ± 329 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit andrej(data_large)
# 13.6 ms ± 1.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit set_at_cum_change(data_large, step=0.5)
# 50.4 µs ± 1.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

关于python - 特定累积变化后的变化值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62181807/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com