gpt4 book ai didi

python - 使用 RE 替换文本 - 允许第一次出现,替换其余的

转载 作者:行者123 更新时间:2023-12-04 10:12:33 24 4
gpt4 key购买 nike

我正在寻找有关如何完成这些任务的一些想法:

  • 允许第一次出现问题词,但禁止在后面使用它和其余的问题词。
  • 没有修改原始文档(.txt 文件)。只修改print()。
  • 保持电子邮件的相同结构。如果有换行符、制表符或奇怪的间距,让它们保持完整性。

  • 这是代码示例:
    import re

    # Sample email is "Hello, banned1. This is banned2. What is going on with
    # banned 3? Hopefully banned1 is alright."
    sample_email = open('email.txt', 'r').read()


    # First use of any of these words is allowed; those following are banned
    problem_words = ['banned1', 'banned2', 'banned3']


    # TODO: Filter negative_words into overused_negative_words
    banned_problem_words = []
    for w in problem_words:
    if sample_email.count(f'\\b{w}s?\\b') > 1:
    banned_problem_words.append(w)


    pattern = '|'.join(f'\\b{w}s?\\b' for w in banned_problem_words)


    def list_check(email, pattern):
    return re.sub(pattern, 'REDACTED', email, flags=re.IGNORECASE)


    print(list_check(sample_email, pattern))
    # Result should be: "Hello, banned1. This is REDACTED. What is going on with
    # REDACTED? Hopefully REDACTED is alright."

    最佳答案

    repl re.sub 的论据可以接受一个接受匹配对象并返回替换字符串的函数。这是我的解决方案:

    import re

    sample_email = open('email.txt', 'r').read()

    # First use of any of these words is allowed; those following are banned
    problem_words = ['banned1', 'banned2', 'banned3']

    pattern = '|'.join(f'\\b{w}\\b' for w in problem_words)

    occurrences = 0

    def redact(match):
    global occurrences
    occurrences += 1
    if occurrences > 1:
    return "REDACTED"
    return match.group(0)

    replaced = re.sub(pattern, redact, sample_email, flags=re.IGNORECASE)
    print(replaced)

    (进一步说明, string.count 不支持正则表达式,但无需计算)

    关于python - 使用 RE 替换文本 - 允许第一次出现,替换其余的,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61258343/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com