gpt4 book ai didi

python - 从 numpy 距离数组中提取 N 个最近的对

转载 作者:行者123 更新时间:2023-11-28 21:57:00 26 4
gpt4 key购买 nike

我有一个大型的对称二维距离阵列。我想获得最接近的 N 对观察结果。

该数组存储为 numpy 压缩数组,具有大约 1 亿个观测值。

这是一个在较小的阵列(~500k 观测值)上获取 100 个最近距离的示例,但它比我想要的要慢很多。

import numpy as np
import random
import sklearn.metrics.pairwise
import scipy.spatial.distance

N = 100
r = np.array([random.randrange(1, 1000) for _ in range(0, 1000)])
c = r[:, None]

dists = scipy.spatial.distance.pdist(c, 'cityblock')

# these are the indices of the closest N observations
closest = dists.argsort()[:N]

# but it's really slow to get out the pairs of observations
def condensed_to_square_index(n, c):
# converts an index in a condensed array to the
# pair of observations it represents
# modified from here: http://stackoverflow.com/questions/5323818/condensed-matrix-function-to-find-pairs
ti = np.triu_indices(n, 1)
return ti[0][c]+ 1, ti[1][c]+ 1

r = []
n = np.ceil(np.sqrt(2* len(dists)))
for i in closest:
pair = condensed_to_square_index(n, i)
r.append(pair)

在我看来,必须有更快的方法来使用标准的 numpy 或 scipy 函数来执行此操作,但我很难过。

注意如果很多对是等距的,那没关系,在这种情况下我不关心它们的顺序。

最佳答案

您不需要在每次调用 condensed_to_square_index 时计算 ti。这是一个只计算一次的基本修改:

import numpy as np
import random
import sklearn.metrics.pairwise
import scipy.spatial.distance

N = 100
r = np.array([random.randrange(1, 1000) for _ in range(0, 1000)])
c = r[:, None]

dists = scipy.spatial.distance.pdist(c, 'cityblock')

# these are the indices of the closest N observations
closest = dists.argsort()[:N]

# but it's really slow to get out the pairs of observations
def condensed_to_square_index(ti, c):
return ti[0][c]+ 1, ti[1][c]+ 1

r = []
n = np.ceil(np.sqrt(2* len(dists)))
ti = np.triu_indices(n, 1)

for i in closest:
pair = condensed_to_square_index(ti, i)
r.append(pair)

您还可以向量化 r 的创建:

r  = zip(ti[0][closest] + 1, ti[1][closest] + 1)

r = np.vstack(ti)[:, closest] + 1

关于python - 从 numpy 距离数组中提取 N 个最近的对,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20540889/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com