gpt4 book ai didi

python - 从相似度 numpy.ndarray 中获取 top-K 相关文档

转载 作者:太空宇宙 更新时间:2023-11-04 06:14:03 25 4
gpt4 key购买 nike

我使用定义的文档相似度 here .

我的问题是如何从 numpy.ndarray 中获取最相关的文档有没有办法对 numpy 数组进行排序并获取相似的前 K 个相关文档?

这是示例代码。

from sklearn.feature_extraction.text import TfidfVectorizer

poem = ["All the world's a stage",
"And all the men and women merely players",
"They have their exits and their entrances",
"And one man in his time plays many parts",
"His acts being seven ages. At first, the infant",
"Mewling and puking in the nurse's arms",
"And then the whining school-boy, with his satchel",
"And shining morning face, creeping like snail",
"Unwillingly to school. And then the lover",
"Sighing like furnace, with a woeful ballad",
"Made to his mistress' eyebrow. Then a soldier",
"Full of strange oaths and bearded like the pard",
"Jealous in honour, sudden and quick in quarrel",
"Seeking the bubble reputation",
"Even in the cannon's mouth. And then the justice",
"In fair round belly with good capon lined",
"With eyes severe and beard of formal cut",
"Full of wise saws and modern instances",
"And so he plays his part. The sixth age shifts",
"Into the lean and slipper'd pantaloon",
"With spectacles on nose and pouch on side",
"His youthful hose, well saved, a world too wide",
"For his shrunk shank; and his big manly voice",
"Turning again toward childish treble, pipes",
"And whistles in his sound. Last scene of all",
"That ends this strange eventful history",
"Is second childishness and mere oblivion",
"Sans teeth, sans eyes, sans taste, sans everything"]


vect = TfidfVectorizer(min_df=1)
tfidf = vect.fit_transform(poem)

result = (tfidf * tfidf.T).A

print(type(result))

print(result)

最佳答案

将 diag 元素设置为零,然后使用 argsort() 查找展平数组中的前 K 个索引,并使用 unravel_index() 将一维索引转换为二维索引:

result[np.diag_indices_from(result)] = 0.0
idx = np.argsort(result, axis=None)[-10:]
midx = np.unravel_index(idx, result.shape)
print midx
print result[midx]

结果:

(数组([ 8, 14, 1, 0, 11, 17, 8, 10, 6, 8]), 数组([14, 8, 0, 1, 17, 11, 10, 8, 8, 6]))[ 0.2329741 0.2329741 0.2379527 0.2379527 0.25723394 0.25723394 0.26570327 0.26570327 0.34954834 0.34954834]

关于python - 从相似度 numpy.ndarray 中获取 top-K 相关文档,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16993707/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com