gpt4 book ai didi

python - 如何删除Python中两个特定单词之间的文本

转载 作者:行者123 更新时间:2023-12-01 07:28:34 25 4
gpt4 key购买 nike

我已经使用漂亮的 soup 包解析了一个 url 以获取其文本。我想删除条款和条件部分中找到的所有文本,即“关键条款:……适用条款和条件”段落中的所有文字。

以下是我尝试过的:

import re

#"text" is part of the text contained in the url
text="Welcome to Company Key.
Key Terms; Single bets only. Any returns from the free bet will be paid
back into your account minus the free bet stake. Free bets can only be
placed at maximum odds of 5.00 (4/1). Bonus will expire midnight, Tuesday
26th February 2019. Bonus T&Cs and General T&Cs apply.
"
rex=re.compile('Key\ (.*?)T&Cs.')"""to remove words between "Key" and
"T&Cs" """
terms_and_cons=rex.findall(text)
text=re.sub("|".join(terms_and_cons)," ",text)
#I also tried: text=re.sub(terms_and_cons[0]," ",text)
print(text)

即使列表“terms_and_cons”非空,上面的内容也只是保留字符串“text”不变。如何成功删除“Key”和“T&Cs”之间的文字?请帮我。我已经在这段看似简单的代码上停留了很长一段时间了,它变得非常令人沮丧。谢谢。

最佳答案

您缺少 re.DOTALL在正则表达式中标记,以将换行符与点匹配。

方法一:使用re.sub

import re

text="""Welcome to Company Key.
Key Terms; Single bets only. Any returns from the free bet will be paid
back into your account minus the free bet stake. Free bets can only be
placed at maximum odds of 5.00 (4/1). Bonus will expire midnight, Tuesday
26th February 2019. Bonus T&Cs and General T&Cs apply.
"""

rex = re.compile("Key\s(.*)T&Cs", re.DOTALL)
text = rex.sub("Key T&Cs", text)
print(text)

方法2:使用组

将文本与一组相匹配,并从原始文本中删除该组的文本。

import re

text="""Welcome to Company Key.
Key Terms; Single bets only. Any returns from the free bet will be paid
back into your account minus the free bet stake. Free bets can only be
placed at maximum odds of 5.00 (4/1). Bonus will expire midnight, Tuesday
26th February 2019. Bonus T&Cs and General T&Cs apply.
"""

rex = re.compile("Key\s(.*)T&Cs", re.DOTALL)
matches = re.search(rex, text)
text = text.replace(matches.group(1), "")
print(text)

关于python - 如何删除Python中两个特定单词之间的文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57322034/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com