gpt4 book ai didi

Python:检查句子是否包含列表中的任何单词(模糊匹配)

转载 作者:行者123 更新时间:2023-11-28 22:12:29 24 4
gpt4 key购买 nike

我想从给定 list_of_keywords 的句子中提取关键字。

我设法提取了确切的词

[word for word in Sentence if word in set(list_of_keywords)]

是否可以提取与给定的list_of_keywords相似度高的词,即两个词之间的余弦相似度> 0.8

例如,给定列表中的关键字是'allergy',现在句子写为

“对她吃的那顿饭中的坚果有严重的过敏 react 。”

'过敏'和'过敏'之间的余弦距离可以计算如下

cosdis(word2vec('allergy'), word2vec('allergic'))
Out[861]: 0.8432740427115677

如何根据余弦相似度从句子中提取'allergic'?

最佳答案

def word2vec(word):
from collections import Counter
from math import sqrt

# count the characters in word
cw = Counter(word)
# precomputes a set of the different characters
sw = set(cw)
# precomputes the "length" of the word vector
lw = sqrt(sum(c*c for c in cw.values()))

# return a tuple
return cw, sw, lw

def cosdis(v1, v2):
# which characters are common to the two words?
common = v1[1].intersection(v2[1])
# by definition of cosine distance we have
return sum(v1[0][ch]*v2[0][ch] for ch in common)/v1[2]/v2[2]


list_of_keywords = ['allergy', 'something']
Sentence = 'a severe allergic reaction to nuts in the meal she had consumed.'

threshold = 0.80
for key in list_of_keywords:
for word in Sentence.split():
try:
# print(key)
# print(word)
res = cosdis(word2vec(word), word2vec(key))
# print(res)
if res > threshold:
print("Found a word with cosine distance > 80 : {} with original word: {}".format(word, key))
except IndexError:
pass

输出:

Found a word with cosine distance > 80 : allergic with original word: allergy

编辑:

单行 killer :

print([x for x in Sentence.split() for y in list_of_keywords if cosdis(word2vec(x), word2vec(y)) > 0.8])

输出:

['allergic']

关于Python:检查句子是否包含列表中的任何单词(模糊匹配),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54807745/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com