gpt4 book ai didi

python - 创建一个具有时间步长和多个特征的新数组,例如 LSTM

转载 作者:太空宇宙 更新时间:2023-11-04 05:10:10 25 4
gpt4 key购买 nike

您好,我正在使用 numpy 为 LSTM 创建一个具有时间步长和多个特征的新数组。

我研究了多种使用步幅和 reshape 的方法,但未能找到有效的解决方案。

这是一个解决玩具问题的函数,但我有 30,000 个样本,每个样本有 100 个特征。

    def make_timesteps(a, timesteps):
array = []
for j in np.arange(len(a)):
unit = []
for i in range(timesteps):
unit.append(np.roll(a, i, axis=0)[j])
array.append(unit)
return np.array(array)

inArr = np.array([[1, 2], [3,4], [5,6]])

inArr.shape => (3, 2)

outArr = make_timesteps(inArr, 2)

outArr.shape => (3, 2, 2)

    assert(np.array_equal(outArr, 
np.array([[[1, 2], [3, 4]], [[3, 4], [5, 6]], [[5, 6], [1, 2]]])))

=> 正确

是否有更有效的方法(必须有!!)有人可以帮忙吗?

最佳答案

一个技巧是将最后 L-1 行附加到数组之外,并将它们附加到数组的开头。然后,使用非常有效的 NumPy strides 将是一个简单的案例.对于想知道这个技巧的成本的人来说,正如我们稍后将通过计时测试看到的那样,它就像没有一样好。

通向支持向前和向后跨步的代码的最终目标的技巧看起来像这样 -

后退:

def strided_axis0_backward(inArr, L = 2):
# INPUTS :
# a : Input array
# L : Length along rows to be cut to create per subarray

# Append the last row to the start. It just helps in keeping a view output.
a = np.vstack(( inArr[-L+1:], inArr ))

# Store shape and strides info
m,n = a.shape
s0,s1 = a.strides

# Length of 3D output array along its axis=0
nd0 = m - L + 1

strided = np.lib.stride_tricks.as_strided
return strided(a[L-1:], shape=(nd0,L,n), strides=(s0,-s0,s1))

向前跨步:

def strided_axis0_forward(inArr, L = 2):
# INPUTS :
# a : Input array
# L : Length along rows to be cut to create per subarray

# Append the last row to the start. It just helps in keeping a view output.
a = np.vstack(( inArr , inArr[:L-1] ))

# Store shape and strides info
m,n = a.shape
s0,s1 = a.strides

# Length of 3D output array along its axis=0
nd0 = m - L + 1

strided = np.lib.stride_tricks.as_strided
return strided(a[:L-1], shape=(nd0,L,n), strides=(s0,s0,s1))

sample 运行-

In [42]: inArr
Out[42]:
array([[1, 2],
[3, 4],
[5, 6]])

In [43]: strided_axis0_backward(inArr, 2)
Out[43]:
array([[[1, 2],
[5, 6]],

[[3, 4],
[1, 2]],

[[5, 6],
[3, 4]]])

In [44]: strided_axis0_forward(inArr, 2)
Out[44]:
array([[[1, 2],
[3, 4]],

[[3, 4],
[5, 6]],

[[5, 6],
[1, 2]]])

运行时测试-

In [53]: inArr = np.random.randint(0,9,(1000,10))

In [54]: %timeit make_timesteps(inArr, 2)
...: %timeit strided_axis0_forward(inArr, 2)
...: %timeit strided_axis0_backward(inArr, 2)
...:
10 loops, best of 3: 33.9 ms per loop
100000 loops, best of 3: 12.1 µs per loop
100000 loops, best of 3: 12.2 µs per loop

In [55]: %timeit make_timesteps(inArr, 10)
...: %timeit strided_axis0_forward(inArr, 10)
...: %timeit strided_axis0_backward(inArr, 10)
...:
1 loops, best of 3: 152 ms per loop
100000 loops, best of 3: 12 µs per loop
100000 loops, best of 3: 12.1 µs per loop

In [56]: 152000/12.1 # Speedup figure
Out[56]: 12561.98347107438

即使我们增加输出中子数组的长度,strided_axis0 的时间也保持不变。这只是向我们展示了 strides 的巨大好处,当然还有与原始循环版本相比的疯狂加速。

正如一开始所 promise 的,这是使用 np.vstack 堆叠成本的时间安排 -

In [417]: inArr = np.random.randint(0,9,(1000,10))

In [418]: L = 10

In [419]: %timeit np.vstack(( inArr[-L+1:], inArr ))
100000 loops, best of 3: 5.41 µs per loop

时序支持堆叠是一种非常有效的想法。

关于python - 创建一个具有时间步长和多个特征的新数组,例如 LSTM,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43207918/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com