gpt4 book ai didi

python - 在 Python 中查找重复矩阵?

转载 作者:太空狗 更新时间:2023-10-30 01:06:49 24 4
gpt4 key购买 nike

我有一个矩阵a.shape: (80000, 38, 38)。我想检查第一维是否有任何重复或类似的 (38,38) 矩阵(在本例中,有 80000 个这样的矩阵)。

我可以运行两个 for 循环:

for i in range(a.shape[0]):
for g in range(a.shape[0]):
if a[i,:,:] - a[g,:,:] < tolerance:
# save the index here

但这似乎非常低效。我知道有 numpy.unique,但我不确定当你有一组二维矩阵时我是否理解它是如何工作的。

有关执行此操作的有效方法的建议?有没有办法让广播找到所有矩阵中所有元素的差异?

最佳答案

检测完全重复的 block

这是一种使用 lex-sorting 的方法-

# Reshape a to a 2D as required in few places later on
ar = a.reshape(a.shape[0],-1)

# Get lex-sorted indices
sortidx = np.lexsort(ar.T)

# Lex-sort reshaped array to bring duplicate rows next to each other.
# Perform differentiation to check for rows that have at least one non-zero
# as those represent unique rows and as such those are unique blocks
# in axes(1,2) for the original 3D array
out = a[sortidx][np.append(True,(np.diff(ar[sortidx],axis=0)!=0).any(1))]

这是考虑 axes=(1,2) 中每个元素 block 的另一种方法作为索引元组以找出其他 block 之间的唯一性 -

# Reshape a to a 2D as required in few places later on
ar = a.reshape(a.shape[0],-1)

# Get dimension shape considering each block in axes(1,2) as an indexing tuple
dims = np.append(1,(ar[:,:-1].max(0)+1).cumprod())

# Finally get unique indexing tuples' indices that represent unique
# indices along first axis for indexing into input array and thus get
# the desired output of unique blocks along the axes(1,2)
out = a[np.unique(ar.dot(dims),return_index=True)[1]]

sample 运行-

1] 输入:

In [151]: a
Out[151]:
array([[[12, 4],
[ 0, 1]],

[[ 2, 4],
[ 3, 2]],

[[12, 4],
[ 0, 1]],

[[ 3, 4],
[ 1, 3]],

[[ 2, 4],
[ 3, 2]],

[[ 3, 0],
[ 2, 1]]])

2] 输出:

In [152]: ar = a.reshape(a.shape[0],-1)
...: sortidx = np.lexsort(ar.T)
...:

In [153]: a[sortidx][np.append(True,(np.diff(ar[sortidx],axis=0)!=0).any(1))]
Out[153]:
array([[[12, 4],
[ 0, 1]],

[[ 3, 0],
[ 2, 1]],

[[ 2, 4],
[ 3, 2]],

[[ 3, 4],
[ 1, 3]]])

In [154]: dims = np.append(1,(ar[:,:-1].max(0)+1).cumprod())

In [155]: a[np.unique(ar.dot(dims),return_index=True)[1]]
Out[155]:
array([[[12, 4],
[ 0, 1]],

[[ 3, 0],
[ 2, 1]],

[[ 2, 4],
[ 3, 2]],

[[ 3, 4],
[ 1, 3]]])

检测相似 block

对于相似性标准,假设您指的是 (a[i,:,:] - a[g,:,:]).all() < tolerance 的绝对值,这里有一个向量化的方法来获取沿 axes(1,2) 的所有相似 block 的索引在输入数组中 -

R,C = np.triu_indices(a.shape[0],1)
mask = (np.abs(a[R] - a[C]) < tolerance).all(axis=(1,2))
I,G = R[mask], C[mask]

sample 运行-

In [267]: a
Out[267]:
array([[[12, 4],
[ 0, 1]],

[[ 2, 4],
[ 3, 2]],

[[13, 4],
[ 0, 1]],

[[ 3, 4],
[ 1, 3]],

[[ 2, 4],
[ 3, 2]],

[[12, 5],
[ 1, 1]]])

In [268]: tolerance = 2

In [269]: R,C = np.triu_indices(a.shape[0],1)
...: mask = (np.abs(a[R] - a[C]) < tolerance).all(axis=(1,2))
...: I,G = R[mask], C[mask]
...:

In [270]: I
Out[270]: array([0, 0, 1, 2])

In [271]: G
Out[271]: array([2, 5, 4, 5])

关于python - 在 Python 中查找重复矩阵?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34999090/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com