gpt4 book ai didi

python - 从第三个数组中的两个数组中有效地获取每对元素的最小值

转载 作者:太空狗 更新时间:2023-10-30 00:47:47 30 4
gpt4 key购买 nike

我有两个 N float 数组(充当 (x,y) 坐标并且可能有重复项)和关联的 z N float 组(充当坐标的权重)。

对于每个 (x,y) 浮点对,我需要选择具有最小关联 z 值的对。我已经定义了一个执行此操作的 selectMinz() 函数(请参阅下面的代码),但它花费的时间太长。

我怎样才能提高这个函数的性能?

import numpy as np
import time


def getData():
N = 100000
x = np.arange(0.0005, 0.03, 0.001)
y = np.arange(6., 10., .05)
# Select N values for x,y, where values can be repeated
x = np.random.choice(x, N)
y = np.random.choice(y, N)
z = np.random.uniform(10., 15., N)
return x, y, z


def selectMinz(x, y, z):
"""
Select the minimum z for each (x,y) pair.
"""
xy_unq, z_unq = [], []
# For each (x,y) pair
for i, xy in enumerate(zip(*[x, y])):
# If this xy pair was already stored in the xy_unq list
if xy in xy_unq:
# If the stored z value associated with this xy pair is
# larger than this new z[i] value
if z_unq[xy_unq.index(xy)] > z[i]:
# Store this smaller value instead
z_unq[xy_unq.index(xy)] = z[i]
else:
# Store the xy pair, and its associated z value
xy_unq.append(xy)
z_unq.append(z[i])

return xy_unq, z_unq


# Define data with the proper format.
x, y, z = getData()

s = time.clock()
xy_unq, z_unq = selectMinz(x, y, z) # <-- TAKES TOO LONG (~15s in my system)
print(time.clock() - s)

最佳答案

步骤:

  1. 使用lex-sort 使x-y 对按顺序排列。或者,我们可以使用缩放方法将一个数组按另一个数组的值范围缩放,然后将其与另一个数组相加,最后使用 argsort 获得 lex 排序的等效索引。
  2. 使用 np.minimum.reduceat 获取间隔中的最小值,由对分组定义。

因此,我们将有一个矢量化解决方案,就像这样 -

def selectMinz_vectorized(x, y, z):
# Get grouped lex-sort indices
sidx = (y + x*(y.max() - y.min() + 1)).argsort()
# or sidx = np.lexsort([x, y])

# Lex-sort x, y, z
x_sorted = x[sidx]
y_sorted = y[sidx]
z_sorted = z[sidx]

# Get equality mask between each sorted X and Y elem against previous ones.
# The non-zero indices of its inverted mask gives us the indices where the
# new groupings start. We are calling those as cut_idx.
seq_eq_mask = (x_sorted[1:] == x_sorted[:-1]) & (y_sorted[1:] == y_sorted[:-1])
cut_idx = np.flatnonzero(np.concatenate(( [True], ~seq_eq_mask)))

# Use those cut_idx to get intervalled minimum values
minZ = np.minimum.reduceat(z_sorted, cut_idx)

# Make tuples of the groupings of x,y and the corresponding min Z values
return (zip(x_sorted[cut_idx], y_sorted[cut_idx]), minZ.tolist())

sample 运行-

In [120]: np.c_[x,y,z]
Out[120]:
array([[ 0., 1., 69.],
[ 2., 0., 47.],
[ 1., 0., 62.],
[ 0., 2., 33.],
[ 1., 7., 32.],
[ 1., 0., 50.],
[ 2., 0., 55.]])

In [121]: selectMinz(x,y,z) # original method
Out[121]:
([(0.0, 1.0), (2.0, 0.0), (1.0, 0.0), (0.0, 2.0), (1.0, 7.0)],
[69.0, 47.0, 50.0, 33.0, 32.0])

In [122]: selectMinz_vectorized(x,y,z)
Out[122]:
([(1.0, 0.0), (2.0, 0.0), (0.0, 1.0), (0.0, 2.0), (1.0, 7.0)],
[50.0, 47.0, 69.0, 33.0, 32.0])

这是我最初的方法,涉及创建堆叠数组然后执行这些操作。实现看起来像这样 -

def selectMinz_vectorized_v2(x, y, z):
d = np.column_stack((x,y,z))
sidx = np.lexsort(d[:,:2].T)
b = d[sidx]
cut_idx = np.r_[0,np.flatnonzero(~(b[1:,:2] == b[:-1,:2]).all(1))+1]
minZ = np.minimum.reduceat(b[:,-1], cut_idx)
return ([tuple(i) for i in b[cut_idx,:2]], minZ.tolist())

矢量化方法的基准测试

方法-

# Pruned version of the approach posted earlier
def selectMinz_vectorized_pruned(x, y, z):
sidx = (y + x*(y.max() - y.min() + 1)).argsort()
x_sorted = x[sidx]
y_sorted = y[sidx]
z_sorted = z[sidx]
seq_eq_mask = (x_sorted[1:] == x_sorted[:-1]) & (y_sorted[1:] == y_sorted[:-1])
cut_idx = np.flatnonzero(np.concatenate(( [True], ~seq_eq_mask)))
minZ = np.minimum.reduceat(z_sorted, cut_idx)
return x_sorted[cut_idx], y_sorted[cut_idx], minZ

def numpy_indexed_app(x,y,z): # @Eelco Hoogendoorn's soln
return npi.group_by((x, y)).min(z)

时间 -

In [141]: x,y,z=getData(10000)

In [142]: %timeit selectMinz_vectorized_pruned(x, y, z)
...: %timeit numpy_indexed_app(x,y,z)
...:
1000 loops, best of 3: 763 µs per loop
1000 loops, best of 3: 1.09 ms per loop

In [143]: x,y,z=getData(100000)

In [144]: %timeit selectMinz_vectorized_pruned(x, y, z)
...: %timeit numpy_indexed_app(x,y,z)
...:
100 loops, best of 3: 8.53 ms per loop
100 loops, best of 3: 12.9 ms per loop

关于python - 从第三个数组中的两个数组中有效地获取每对元素的最小值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45887270/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com