gpt4 book ai didi

python - 在 Python 中使用正则表达式查找和替换文件中的单词列表

转载 作者:太空狗 更新时间:2023-10-29 21:31:28 24 4
gpt4 key购买 nike

我想将文件的内容打印到终端,并在此过程中突出显示列表中找到的所有单词而不修改原始文件。这是尚未运行的代码示例:

    def highlight_story(self):
"""Print a line from a file and highlight words in a list."""

the_file = open(self.filename, 'r')
file_contents = the_file.read()

for word in highlight_terms:
regex = re.compile(
r'\b' # Word boundary.
+ word # Each item in the list.
+ r's{0,1}', # One optional 's' at the end.
flags=re.IGNORECASE | re.VERBOSE)
subst = '\033[1;41m' + r'\g<0>' + '\033[0m'
result = re.sub(regex, subst, file_contents)

print result
the_file.close()

highlight_terms = [
'dog',
'hedgehog',
'grue'
]

实际上,只有列表中的最后一项,无论它是什么或列表有多长,都会被突出显示。我假设每次替换都已执行,然后在下一次迭代开始时被“遗忘”。它看起来像这样:

Grues have been known to eat both human and non-human animals. In poorly-lit areas dogs and hedgehogs are considered by any affluent grue to a be delicacies. Dogs can frighten awat a grue, however, by barking in a musical scale. A hedgehog, on the other hand, must simply resign itself to its fate of becoming a hotdog fit for a grue king.

但它应该是这样的:

Grues have been known to eat both human and non-human animals. In poorly-lit areas dogs and hedgehogs are considered by any affluent grue to a be delicacies. Dogs can frighten away a grue, however, by barking in a musical scale. A hedgehog, on the other hand, must simply resign itself to its fate of becoming a hotdog fit for a grue king.

如何防止其他替换丢失?

最佳答案

您可以将正则表达式修改为以下内容:

regex = re.compile(r'\b('+'|'.join(highlight_terms)+r')s?', flags=re.IGNORECASE | re.VERBOSE)  # note the ? instead of {0, 1}. It has the same effect

那么,您将不需要 for 循环。

此代码获取单词列表,然后使用 | 将它们连接在一起。所以如果你的列表是这样的:

a = ['cat', 'dog', 'mouse'];

正则表达式为:

\b(cat|dog|mouse)s?

关于python - 在 Python 中使用正则表达式查找和替换文件中的单词列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26821226/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com