gpt4 book ai didi

python - 迭代文本并查找预定义子字符串之间的距离

转载 作者:行者123 更新时间:2023-12-01 07:21:50 25 4
gpt4 key购买 nike

我决定要获取一段文本并找出文本中某些标签的接近程度。基本上,这个想法是检查两个人之间的距离是否小于 14 个单词,如果是,我们就说他们是相关的。

我的幼稚实现是有效的,但前提是该人是单个单词,因为我会迭代单词。

text = """At this moment Robert  who rises at seven and works before 
breakfast came in He glanced at his wife her cheek was
slightly flushed he patted it caressingly What s the
matter my dear he asked She objects to my doing nothing
and having red hair said I in an injured tone Oh of
course he can t help his hair admitted Rose It generally
crops out once in a generation said my brother So does the
nose Rudolf has got them both I must premise that I am going
perforce to rake up the very scandal which my dear Lady
Burlesdon wishes forgotten--in the year 1733 George II
sitting then on the throne peace reigning for the moment and
the King and the Prince of Wales being not yet at loggerheads
there came on a visit to the English Court a certain prince
who was afterwards known to history as Rudolf the Third of Ruritania"""
involved = ['Robert', 'Rose', 'Rudolf the Third',
'a Knight of the Garter', 'James', 'Lady Burlesdon']

# my naive implementation
ws = text.split()
l = len(ws)
for wi,w in enumerate(ws):
# Skip if the word is not a person
if w not in involved:
continue
# Check next x words for any involved person
x = 14
for i in range(wi+1,wi+x):
# Avoid list index error
if i >= l:
break
# Skip if the word is not a person
if ws[i] not in involved:
continue
# Print related
print(ws[wi],ws[i])

现在我想升级此脚本以允许使用多单词名称,例如“Lady Burlesdon”。我不完全确定什么是最好的继续方式。欢迎任何提示。

最佳答案

您可以首先预处理文本,以便将 text 中的所有名称替换为单字 ID。 id 必须是您不希望在文本中作为其他单词出现的字符串。在预处理文本时,您可以保留 id 到名称的映射,以了解哪个名称对应于哪个 id。这将允许您保持当前的算法不变。

关于python - 迭代文本并查找预定义子字符串之间的距离,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57663066/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com