gpt4 book ai didi

python - 堆叠具有重叠索引的数组。寻找循环上的矢量化方法

转载 作者:太空宇宙 更新时间:2023-11-03 19:51:08 26 4
gpt4 key购买 nike

我正在寻找一种矢量化方法来循环数组索引,以将它们垂直堆叠在具有重叠索引的组中。
给出我想要实现的目标的要点:

给定一个列表[1,2,3,4,5,6],一个值为2的区间变量和一个重叠变量> 值为 1。输出应如下所示:[[1,2],[2,3],[3,4],[4,5],[5,6]]

但是,我拥有的数据是1560x2x87236的形状,其中1560是主体,2x87236是x,y轨迹。因此,对于每个科目,我有 87236 x 分和 87326 y 分。通过变换保持代表 xs 和 ys 的维度 2 至关重要。

<小时/>

为了简化表示:

假设我有一个 ndarray:

arr

array([[[35, 33, 34, 42, 32, 30],
[22, 38, 29, 33, 25, 14]],
[[17, 25, 39, 17, 41, 22],
[22, 13, 14, 31, 20, 38]],
[[30, 10, 33, 25, 38, 26],
[28, 27, 19, 27, 43, 13]]])

arr.shape

(3, 2, 6)

我想要做的是将这个数组以3组或3组间隔堆叠,并具有重叠索引(重叠1个索引)。输出看起来像这样:

stacked_arr

array([[[ 0.,  0.,  0.],
[ 0., 0., 0.]],

[[35., 33., 34.],
[22., 38., 29.]],

[[34., 42., 32.],
[29., 33., 25.]],

[[17., 25., 39.],
[22., 13., 14.]],

[[39., 17., 41.],
[14., 31., 20.]],

[[30., 10., 33.],
[28., 27., 19.]],

[[33., 25., 38.],
[19., 27., 43.]]])

stacked_arr.shape

(7, 2, 3)

这是我编写的实现上述结果的函数:

def overlap_stack(data, padwith, interv, overlapby):
sub = 0

# Initialise: 1 bcuz for a sub, 2 bcuz of x,y
stacked = cp.zeros(shape=(1, 2, interv))
while sub < data.shape[0]:
idx: int
for idx in range(0, data.shape[2], interv - overlapby):

# grouping with overlaps
stack = cp.expand_dims(data[sub, :, idx: idx + interv], axis=0)

# pad to cope with unequal length
if (stack.shape[2]) < interv:
stack = cp.pad(stack, ((0, 0), (0, 0), (0, interv - stack.shape[2])), 'constant',
constant_values=padwith)

# stacking all together
stacked = cp.vstack((stacked, stack))


sub += 1
return stacked

转换1560x2x87236的数组需要8到10个小时以上。如果您能以任何方式帮助我加快此过程,我将不胜感激。

最佳答案

不知道你是否熟悉numpy.lib.stride_tricks.as_strided ,但这里有一个使用它的解决方案:

import numpy as np
from numpy.lib.stride_tricks import as_strided

def overlap_stack(data, interv, overlapby):
A = np.vstack(data)

window_size = (data.shape[1], interv)
strides = (window_size[0], interv - overlapby)

output_strides = (strides[0]*A.strides[0], strides[1]*A.strides[1]) + A.strides

output_shape = ((A.shape[0] - window_size[0])//strides[0] + 1,
(A.shape[1] - window_size[1])//strides[1] + 1) + window_size

return as_strided(A, shape=output_shape, strides=output_strides).reshape(-1, *output_shape[2:])

我忽略了填充,因为我不确定你想要它如何(不过你可以自己添加它)。

例如:

data = np.array([[[35, 33, 34, 42, 32, 30],
[22, 38, 29, 33, 25, 14]],
[[17, 25, 39, 17, 41, 22],
[22, 13, 14, 31, 20, 38]],
[[30, 10, 33, 25, 38, 26],
[28, 27, 19, 27, 43, 13]]])

overlap_stack(data, 3, 1)

array([[[35, 33, 34],
[22, 38, 29]],

[[34, 42, 32],
[29, 33, 25]],

[[17, 25, 39],
[22, 13, 14]],

[[39, 17, 41],
[14, 31, 20]],

[[30, 10, 33],
[28, 27, 19]],

[[33, 25, 38],
[19, 27, 43]]])

请注意,对于形状为 (1560, 2, 87236) 的数组,这会非常快,但会占用大量内存。

关于python - 堆叠具有重叠索引的数组。寻找循环上的矢量化方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59833898/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com