gpt4 book ai didi

python - [tricky]根据邻近度搜索多次出现的单词对。 Python

转载 作者:太空宇宙 更新时间:2023-11-03 17:04:59 26 4
gpt4 key购买 nike

我有一个文本正文和 2 个关键字,即 k1、k2。我想找到 k1 和 k2 出现在 5 个单词附近的所有实例。现在我希望存储来自此搜索的 2 条信息 -

  1. 此类匹配的数量
  2. 最佳匹配的按单词的位置。这里的“最佳”是指 k1 和 k2 之间最接近的匹配。 这是为了我以后可以在这场比赛中投入更多精力

我已经编写了代码,但无法找到匹配项,如下所示。此外,它没有给我匹配的数量或单词的位置。

import re
text = 'the flory of gthys inhibition in this proffession by in aquaporin protein-1 its inhibition by the state of the art in aquaporin 2'
a = 'aquaporin protein-1'
b = 'inhibition'
diff=500
l = re.split(';|,|-| ', text)
l1 = re.split(';|,|-| ', a)
l2 = re.split(';|,|-| ', b)
counts=[m.start() for m in re.finditer(a, text)]
counts1=[m.start() for m in re.finditer(b, text)]
for cc in counts:
for c1 in counts1:
if abs(cc-c1) < diff:
diff = abs(cc-c1)
values = (cc, c1)

if text.find(a) < text.find(b):
r= (l.index(l2[0]) - l.index(l1[-1]))
if text.find(a) > text.find(b):
r= (l.index(l1[0]) - l.index(l2[-1]))
if r<5:
print 'matched'
print r

最佳答案

我决定替换原始文本中的多字关键字,因为这样可以检测到短语,因为它们在用空格分割字符串后不会分割。

然后是一个带有索引和值的简单循环,它使计数和跟踪/存储在元组中关键字的位置与最小接近度相匹配。

text = 'the flory of gthys inhibition in this proffession by in aquaporin protein-1  its inhibition b'
a = 'aquaporin protein-1'
b = 'inhibition'
text = text.replace(a, 'k1')
text = text.replace(b, 'k2')
l = text.split()
#print l
#print 'k1 -> %s' % a
#print 'k2 -> %s' % b

last_a = -1
last_b = -1
counts = 0
max_match_tuple = (6,0) # Initialize it like this since you want to track proximity less than 5
for k,v in enumerate(l):
#print str(k) + '--->' + str(v)
if v == 'k1':
last_a = k
if k - last_b < 6 and last_b != -1:
counts = counts + 1
if k - last_b < max_match_tuple[0] - max_match_tuple[1]:
max_match_tuple = (k, last_b)
if v == 'k2':
last_b = k
if k - last_a < 6 and last_a != -1:
counts = counts + 1
if k - last_a < max_match_tuple[0] - max_match_tuple[1]:
max_match_tuple = (k, last_a) # Careful with the order here since it matters for above substruction
print counts
print max_match_tuple

使用有关replace部分的示例进行一些解释。您可以在文本中将要检测的短语替换为不受分割影响的独特内容,以便能够在稍后的循环中在您的条件中使用它。因此,如果您想更改关键字,您只需更改 ab 变量定义即可。

 text = 'the flory of gthys inhibition in this proffession by in aquaporin      protein-1  its inhibition by the state of the art in aquaporin 2'

a = 'aquaporin protein-1'
text = text.replace(a, '******')

print text

# Output ---> the flory of gthys inhibition in this proffession by in ****** its inhibition by the state of the art in aquaporin 2

b = 'in'
text = text.replace(b, '+++')

# Output ---> the flory of gthys +++hibition +++ this proffession by +++ ****** its +++hibition by the state of the art +++ aquapor+++ 2

关于python - [tricky]根据邻近度搜索多次出现的单词对。 Python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34676351/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com