gpt4 book ai didi

python - 计算 python 中重叠滑动窗口中的值

转载 作者:塔克拉玛干 更新时间:2023-11-03 03:08:39 27 4
gpt4 key购买 nike

给定一个包含已排序值的数组a,以及一个范围数组bins,最有效的计数方式是什么a 中有多少值落在每个范围内,rng,在 bins 中?

目前我正在做以下事情:

def sliding_count(a, end, window, start=0, step=1):
bins = [(x, x + window) for x in range(start, (end + 1) - window, step)]
counts = np.zeros(len(bins))
for i, rng in enumerate(bins):
count = len(a[np.where(np.logical_and(a>=rng[0], a<=rng[1]))])
counts[i] = count
return counts

a = np.array([1, 5, 8, 11, 14, 19])
end = 20
window = 10
sliding_count(a, end, window)

返回预期的数组

array([3., 4., 3., 3., 4., 4., 3., 3., 3., 3., 3.])

但我觉得必须有更有效的方法来做到这一点?

最佳答案

import numpy as np

def alt(a, end, window, start=0, step=1):
bin_starts = np.arange(start, end+1-window, step)
bin_ends = bin_starts + window
last_index = np.searchsorted(a, bin_ends, side='right')
first_index = np.searchsorted(a, bin_starts, side='left')
return last_index - first_index

def sliding_count(a, end, window, start=0, step=1):
bins = [(x, x + window) for x in range(start, (end + 1) - window, step)]
counts = np.zeros(len(bins))
for i, rng in enumerate(bins):
count = len(a[np.where(np.logical_and(a>=rng[0], a<=rng[1]))])
counts[i] = count
return counts

a = np.array([1, 5, 8, 11, 14, 19])
end = 20
window = 10

print(sliding_count(a, end, window))
# [3. 4. 3. 3. 4. 4. 3. 3. 3. 3. 3.]

print(alt(a, end, window))
# [3 4 3 3 4 4 3 3 3 3 3]

alt 的工作原理:

生成 bin 的起始值和结束值:

In [73]: bin_starts = np.arange(start, end+1-window, step); bin_starts
Out[73]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

In [74]: bin_ends = bin_starts + window; bin_ends
Out[74]: array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20])

因为 a 是有序的,你可以使用 np.searchsorted找到第一个和最后一个索引在 bin_startsbin_ends 中,a 中的每个值都适合:

In [75]: last_index = np.searchsorted(a, bin_ends, side='right'); last_index
Out[75]: array([3, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6])

In [76]: first_index = np.searchsorted(a, bin_starts, side='left'); first_index
Out[76]: array([0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3])

count 只是索引的差异:

In [77]: last_index - first_index
Out[77]: array([3, 4, 3, 3, 4, 4, 3, 3, 3, 3, 3])

这是一个perfplot比较 altsliding_count 的性能作为 a 长度的函数:

import perfplot

def make_array(N):
a = np.random.randint(10, size=N)
a = a.cumsum()
return a

def using_sliding(a):
return sliding_count(a, end, window)

def using_alt(a):
return alt(a, end, window)

perfplot.show(
setup=make_array,
kernels=[using_sliding, using_alt],
n_range=[2**k for k in range(22)],
logx=True,
logy=True,
xlabel='len(a)')

enter image description here

Perfplot 还会检查 using_sliding 返回的值是否等于 using_alt 返回的值。

Matt Timmermans' idea ,“从那个 bin 的计数中减去 position_in_a”触发了这个解决方案。

关于python - 计算 python 中重叠滑动窗口中的值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54237254/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com