gpt4 book ai didi

python - 通过正则表达式使用替代方法连接术语

转载 作者:行者123 更新时间:2023-12-04 03:24:24 25 4
gpt4 key购买 nike

Summary of problem:I have written the generic regex to capture two groups from the sentence. Further I need to concatenate the 3rd term of 2nd group to the 1st group. I have used the word and in regex as partition to separate two groups of the sentence. For example:

Input = 'Since, the genetic cells of SAC-1 and RbC-27 synthesis was not caused by WbC-2 of acnes in human face and animals skin.'

Output = 'Since, the genetic cells of SAC-1 synthesis and RbC-27 synthesis was not caused by WbC-2 of acnes in human face skin and animals skin.'

What Regex I have tried:

import re
string_ = "Since, the genetic cells of SAC-1 and RbC-27 synthesis was not caused by WbC-2 of acnes in human face and animals skin."
regex_pattern = re.compile(r"\b([A-Za-z]*-\d+\s*|[A-Za-z]+\s*)\s+(and\s*[A-Za-z]*-\d+\s*[A-Za-z]*|and\s*[A-Za-z]+\s*[A-Za-z]+)?")
print(regex_pattern.findall(string_))
print(regex_pattern.sub(lambda x: x.group(1) + x.group(2)[2], string_))

正则表达式能够捕获组,但我从 substitute 方法行收到错误,如 TypeError: 'NoneType' object is not subscriptable。任何类型的建议或帮助执行上述问题将不胜感激。

最佳答案

拆分解决方案

虽然这不是正则表达式解决方案,但它确实有效:

from string import punctuation

x = 'Since, the genetic cells of SAC-1 and RbC-27 synthesis was not caused by WbC-2 of acnes in human face and animals skin.'
x = x.split()
for idx, word in enumerate(x):
if word == "and":
# strip punctuation or we will get skin. instead of skin
x[idx] = x[idx + 2].strip(punctuation) + " and"
print(' '.join(x))

输出是:

Since, the genetic cells of SAC-1 synthesis and RbC-27 synthesis was not caused by WbC-2 of acnes in human face skin and animals skin.

此解决方案避免直接插入到列表中,因为这会在您迭代时导致索引出现问题。相反,我们将列表中的第一个“and”替换为“synthesis and”,将第二个“and”替换为“skin and”,然后重新加入拆分后的字符串。

正则表达式解决方案

如果您坚持使用正则表达式解决方案,我建议使用 re.findall 和包含单个 and 的模式,因为这对于问题更通用:

from string import punctuation
import re
pattern = re.compile("(.*?)\sand\s(.*?)\s([^\s]+)")
result = ''.join([f"{match[0]} {match[2].strip(punctuation)} and {match[1]} {match[2]}" for match in pattern.findall(x)])
print(result)

Since, the genetic cells of SAC-1 synthesis and RbC-27 synthesis was not caused by WbC-2 of acnes in human face skin and animals skin.

我们再次使用 strip(punctuation) 因为 skin. 被捕获:我们不想丢失 end 的标点符号的句子,但我们确实想在句子中丢失它。

这是我们的模式:

(.*?)\sand\s(.*?)\s([^\s]+)
  1. (.*?)\s:捕获“and”之前的所有内容,包括空格
  2. \s(.*?)\s:捕获紧跟在“and”之后的单词
  3. ([^\s]+):捕获下一个空格(即“and”之后的第二个单词)之前不是空格的任何内容。这确保我们也能捕获标点符号。

关于python - 通过正则表达式使用替代方法连接术语,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67881649/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com