gpt4 book ai didi

python - 如何找到哪些句子有最多的共同词?

转载 作者:太空宇宙 更新时间:2023-11-03 13:46:12 25 4
gpt4 key购买 nike

假设我有一个段落。我通过 sent_tokenize 将其分成句子:

variable = ['By the 1870s the scientific community and much of the general public had accepted evolution as a fact.',
'However, many favoured competing explanations and it was not until the emergence of the modern evolutionary synthesis from the 1930s to the 1950s that a broad consensus developed in which natural selection was the basic mechanism of evolution.',
'Darwin published his theory of evolution with compelling evidence in his 1859 book On the Origin of Species, overcoming scientific rejection of earlier concepts of transmutation of species.']

现在我将每个句子拆分成单词并将其附加到某个变量。我怎样才能找到具有最多相同单词的两组句子。我不知道该怎么做。如果我有 10 个句子,那么我将有 90 个检查(每个句子之间)。谢谢。

最佳答案

您可以使用 python sets 的交集.

如果你有这样的三个句子:

a = "a b c d"
b = "a c x y"
c = "a q v"

您可以检查两个句子中出现了多少相同的单词:

sameWords = set.intersection(set(a.split(" ")), set(c.split(" ")))
numberOfWords = len(sameWords)

有了它,您可以遍历您的句子列表,并找到其中包含最多相同单词的两个句子。这给了我们:

sentences = ["a b c d", "a d e f", "c x y", "a b c d x"]

def similar(s1, s2):
sameWords = set.intersection(set(s1.split(" ")), set(s2.split(" ")))
return len(sameWords)

currentSimilar = 0
s1 = ""
s2 = ""

for sentence in sentences:
for sentence2 in sentences:
if sentence is sentence2:
continue
similiarity = similar(sentence, sentence2)
if (similiarity > currentSimilar):
s1 = sentence
s2 = sentence2
currentSimilar = similiarity

print(s1, s2)

可能有一些 dynamic programming如果性能是一个问题,请解决这个问题。

关于python - 如何找到哪些句子有最多的共同词?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19840079/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com