gpt4 book ai didi

在两个数组中查找唯一元素索引的 Pythonic 方法

转载 作者:行者123 更新时间:2023-12-05 02:43:47 26 4
gpt4 key购买 nike

我有两个类似于这些的排序的 numpy 数组:

x = np.array([1, 2, 8, 11, 15])
y = np.array([1, 8, 15, 17, 20, 21])

元素从不在同一个数组中重复。我想找出一种 pythonicaly 找出包含数组中存在相同元素的位置的索引列表。

例如,1 存在于 xy 的索引 0 中。 x 中的元素 2 不存在于 y 中,因此我不关心该项目。但是,8 确实存在于两个数组中 - 在 x 中的索引 2 中,但在 y 中的索引 1 。类似地,15 存在于两者中,在 x 中的索引 4 中,但在 y< 中的索引 2/。所以我的函数的结果将是一个列表,在本例中返回 [[0, 0], [2, 1], [4, 2]]

到目前为止我正在做的是:

def get_indexes(x, y):
indexes = []
for i in range(len(x)):
# Find index where item x[i] is in y:
j = np.where(x[i] == y)[0]

# If it exists, save it:
if len(j) != 0:
indexes.append([i, j[0]])

return indexes

但问题是数组 xy 非常 很大(数百万项),因此需要相当长的时间。有没有更好的 pythonic 方法来做到这一点?

最佳答案

没有 Python 循环

代码

def get_indexes_darrylg(x, y):
' darrylg answer '
# Use intersect to find common elements between two arrays
overlap = np.intersect1d(x, y)

# Indexes of common elements in each array
loc1 = np.searchsorted(x, overlap)
loc2 = np.searchsorted(y, overlap)

# Result is the zip two 1d numpy arrays into 2d array
return np.dstack((loc1, loc2))[0]

用法

x = np.array([1, 2, 8, 11, 15])
y = np.array([1, 8, 15, 17, 20, 21])
result = get_indexes_darrylg(x, y)

# result[0]: array([[0, 0],
[2, 1],
[4, 2]], dtype=int64)

时间发布解决方案

结果显示 darrlg 代码的运行时间最快。

enter image description here

代码调整

  • 将每个发布的解决方案作为一个函数。
  • 轻微修改,以便每个解决方案输出一个 numpy 数组。
  • 以海报命名的曲线

代码

import numpy as np
import perfplot

def create_arr(n):
' Creates pair of 1d numpy arrays with half the elements equal '
max_val = 100000 # One more than largest value in output arrays

arr1 = np.random.randint(0, max_val, (n,))
arr2 = arr1.copy()

# Change half the elements in arr2
all_indexes = np.arange(0, n, dtype=int)
indexes = np.random.choice(all_indexes, size = n//2, replace = False) # locations to make changes


np.put(arr2, indexes, np.random.randint(0, max_val, (n//2, ))) # assign new random values at change locations

arr1 = np.sort(arr1)
arr2 = np.sort(arr2)

return (arr1, arr2)

def get_indexes_lllrnr101(x,y):
' lllrnr101 answer '
ans = []
i=0
j=0
while (i<len(x) and j<len(y)):
if x[i] == y[j]:
ans.append([i,j])
i += 1
j += 1
elif (x[i]<y[j]):
i += 1
else:
j += 1
return np.array(ans)

def get_indexes_joostblack(x, y):
'joostblack'
indexes = []
for idx,val in enumerate(x):
idy = np.searchsorted(y,val)
try:
if y[idy]==val:
indexes.append([idx,idy])
except IndexError:
continue # ignore index errors

return np.array(indexes)

def get_indexes_mustafa(x, y):
indices_in_x = np.flatnonzero(np.isin(x, y)) # array([0, 2, 4])
indices_in_y = np.flatnonzero(np.isin(y, x[indices_in_x])) # array([0, 1, 2]

return np.array(list(zip(indices_in_x, indices_in_y)))

def get_indexes_darrylg(x, y):
' darrylg answer '
# Use intersect to find common elements between two arrays
overlap = np.intersect1d(x, y)

# Indexes of common elements in each array
loc1 = np.searchsorted(x, overlap)
loc2 = np.searchsorted(y, overlap)

# Result is the zip two 1d numpy arrays into 2d array
return np.dstack((loc1, loc2))[0]

def get_indexes_akopcz(x, y):
' akopcz answer '
return np.array([
[i, j]
for i, nr in enumerate(x)
for j in np.where(nr == y)[0]
])

perfplot.show(
setup = create_arr, # tuple of two 1D random arrays
kernels=[
lambda a: get_indexes_lllrnr101(*a),
lambda a: get_indexes_joostblack(*a),
lambda a: get_indexes_mustafa(*a),
lambda a: get_indexes_darrylg(*a),
lambda a: get_indexes_akopcz(*a),
],
labels=["lllrnr101", "joostblack", "mustafa", "darrylg", "akopcz"],
n_range=[2 ** k for k in range(5, 21)],
xlabel="Array Length",
# More optional arguments with their default values:
# logx="auto", # set to True or False to force scaling
# logy="auto",
equality_check=None, #np.allclose, # set to None to disable "correctness" assertion
# show_progress=True,
# target_time_per_measurement=1.0,
# time_unit="s", # set to one of ("auto", "s", "ms", "us", or "ns") to force plot units
# relative_to=1, # plot the timings relative to one of the measurements
# flops=lambda n: 3*n, # FLOPS plots
)

关于在两个数组中查找唯一元素索引的 Pythonic 方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66781291/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com