gpt4 book ai didi

python - 基于替换和不替换规则的子字符串替换

转载 作者:行者123 更新时间:2023-12-04 03:51:44 24 4
gpt4 key购买 nike

我有一个字符串和用于替换和不替换的规则/映射。

例如

"This is an example sentence that needs to be processed into a new sentence."
"This is a second example sentence that shows how 'sentence' in 'sentencepiece' should not be replaced."

替换规则:

replace_dictionary = {'sentence': 'processed_sentence'}
no_replace_set = {'example sentence'}

结果:

"This is an example sentence that needs to be processed into a new processed_sentence."
"This is a second example sentence that shows how 'processed_sentence' in 'sentencepiece' should not be replaced."

附加条件:

  1. 仅在大小写匹配时才替换,即大小写很重要。
  2. 只进行全词替换,句号忽略,替换后保留。

我在想在 Python 3.x 中解决这个问题的最干净的方法是什么?

最佳答案

基于answer魔偶的。

更新

很抱歉,我错过了一个事实,即只应替换整个单词。我更新了我的代码,甚至将其概括为在函数中使用。

def replace_whole(sentence, replace_token, replace_with, dont_replace):
rx = f"[\"\'\.,:; ]({replace_token})[\"\'\.,:; ]"
iter = re.finditer(rx, sentence)
out_sentence = ""
found = []
indices = []
for m in iter:
indices.append(m.start(0))
found.append(m.group())

context_size=len(dont_replace)
for i in range(len(indices)):
context = sentence[indices[i]-context_size:indices[i]+context_size]
if dont_replace in context:
continue
else:
# First replace the word only in the substring found
to_replace = found[i].replace(replace_token, replace_with)
# Then replace the word in the context found, so any special token like "" or . gets taken over and the context does not change
replace_val = context.replace(found[i], to_replace)
# finally replace the context found with the replacing context
out_sentence = sentence.replace(context, replace_val)

return out_sentence

通过使用 finditer(),使用正则表达式查找字符串的所有出现和值(因为我们需要检查它是一个完整的单词还是嵌入在任何类型的单词中)。您可能需要将 rx 调整为您对“整个单词”的定义。然后获取有关 no_replace 规则大小的这些值的上下文。然后检查上下文是否包含您的 no_replace 字符串。如果没有,您可以通过仅对单词使用 replace() 来替换它,然后替换上下文中出现的单词,然后替换整个文本中的上下文。这样替换过程几乎是独一无二的,不会发生奇怪的行为。

使用您的示例,这会导致:

replace_whole(sen2, "sentence", "processed_sentence", "example sentence")
>>>"This is a second example sentence that shows how 'processed_sentence' in 'sentencepiece' should not be replaced."

replace_whole(sen1, "sentence", "processed_sentence", "example sentence")
>>>'This is an example sentence that needs to be processed into a new processed_sentence.'

关于python - 基于替换和不替换规则的子字符串替换,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64371185/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com