gpt4 book ai didi

python - 根据 numpy 数组中的行生成唯一值

转载 作者:行者123 更新时间:2023-11-28 17:36:58 28 4
gpt4 key购买 nike

我有一个 3D numpy 数组,arr,形状为 m*n*k

对于沿 m 轴的每组值(例如 arr[:, 0, 0]),我想生成一个值来表示这组值,所以我最终可能会得到一个二维矩阵 n*k。如果重复沿 m 轴的一组值,那么我们应该每次都生成相同的值。

即这是一个散列问题。

我使用字典创建了该问题的解决方案,但它大大降低了性能。对于每组值,我调用这个函数:

 def getCellId(self, valueSet):

# Turn the set of values (a numpy vector) to a tuple so it can be hashed
key = tuple(valueSet)

# Try and simply return an existing ID for this key
try:
return self.attributeDict[key]
except KeyError:

# If the key was new (and didnt exist), try and generate a new Id by adding one to the max of all current Id's. This will fail the very first time we do this (as there will be no Id's yet), so in that case, just assign the value '1' to the newId
try:
newId = max(self.attributeDict.values()) +1
except ValueError:
newId = 1
self.attributeDict[key] = newId
return newId

数组本身的大小通常为 30*256*256,因此一组值将有 30 个值。我随时有数百个这样的数组要处理。目前,完成所有需要完成的处理,直到计算哈希100 个阵列 block 需要 1.3 秒。包括长达 75 秒的散列颠簸。

有没有更快的方法生成单个代表值?

最佳答案

这可能是一种使用基本 numpy 函数的方法 -

import numpy as np

# Random input for demo
arr = np.random.randint(0,3,[2,5,4])

# Get dimensions for later usage
m,n,k = arr.shape

# Reshape arr to a 2D array that has each slice arr[:, n, k] in each row
arr2d = np.transpose(arr,(1,2,0)).reshape([-1,m])

# Perform lexsort & get corresponding indices and sorted array
sorted_idx = np.lexsort(arr2d.T)
sorted_arr2d = arr2d[sorted_idx,:]

# Differentiation along rows for sorted array
df1 = np.diff(sorted_arr2d,axis=0)

# Look for changes along df1 that represent new labels to be put there
df2 = np.append([False],np.any(df1!=0,1),0)

# Get unique labels
labels = df2.cumsum(0)

# Store those unique labels in a n x k shaped 2D array
pos_labels = np.zeros_like(labels)
pos_labels[sorted_idx] = labels
out = pos_labels.reshape([n,k])

sample 运行-

In [216]: arr
Out[216]:
array([[[2, 1, 2, 1],
[1, 0, 2, 1],
[2, 0, 1, 1],
[0, 0, 1, 1],
[1, 0, 0, 2]],

[[2, 1, 2, 2],
[0, 0, 2, 1],
[2, 1, 0, 0],
[1, 0, 1, 0],
[0, 1, 1, 0]]])

In [217]: out
Out[217]:
array([[6, 4, 6, 5],
[1, 0, 6, 4],
[6, 3, 1, 1],
[3, 0, 4, 1],
[1, 3, 3, 2]], dtype=int32)

关于python - 根据 numpy 数组中的行生成唯一值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29535341/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com