gpt4 book ai didi

python - 用标签替换字符串中的单词

转载 作者:太空宇宙 更新时间:2023-11-04 04:15:29 26 4
gpt4 key购买 nike

让我们考虑以下 HTML 片段:

html = '''
<p>
The chairman of European Union leaders, Donald Tusk, will meet May in London on Thursday, a day after the bloc’s Brexit negotiator weakened sterling by issuing another warning to Britain, which is due to leave the bloc in March 2019.
</p>
'''

让我们把它变成一个 BeautifulSoup 对象:

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')

我想转换那个 soup 对象,使其 HTML 输出为:

'''
<p>
The chairman of European Union leaders, <span style="color : red"> Donald Tusk </span>, will meet May in London on Thursday, a day after the bloc’s Brexit negotiator weakened sterling by issuing another warning to Britain, which is due to leave the bloc in March 2019.
</p>
'''

我在 the doc page of BeautifulSoup 上找到了如何替换字符串、创建新标签,甚至在树中的特定位置插入新标签的几个示例,但不是如何在中间添加新标签一个字符串就像在我的用例中一样。

非常欢迎任何帮助。

最佳答案

首先让我说,感谢您提出这个问题,因为这是一个非常有趣的编码问题。

我花了一些时间研究这个问题,最后决定把答案扔进擂台。

我尝试使用 insert_before()insert_after()来自 BeautifulSoup修改 <p>示例 HTML 中的标记。我还查看了使用 extend()append()来自 BeautifulSoup .经过几十次尝试,我就是得不到你要求的结果。

下面的代码似乎可以根据关键字(例如 Donald Tusk)完成请求的 HTML 修改。我用了replace_with()来自 BeautifulSoupnew_tag() 替换 HTML 中的原始标记来自 BeautifulSoup.

代码有效,但我确信它可以改进。

from bs4 import BeautifulSoup

raw_html = """
<p> This is a test. </p>
<p>The chairman of European Union leaders, Donald Tusk, will meet May in London on Thursday, a day after the bloc’s Brexit negotiator weakened sterling by issuing another warning to Britain, which is due to leave the bloc in March 2019.</p>
<p> This is also a test. </p>
"""

soup = BeautifulSoup(raw_html, 'lxml')

# find the tag that contains the keyword Donald Tusk
original_tag = soup.find('p',text=re.compile(r'Donald Tusk'))

if original_tag:
# modify text in the tag that was found in the HTML
tag_to_modify = str(original_tag.get_text()).replace('Donald Tusk,', '<span style="color:red">Donald Tusk</span>,')

print (tag_to_modify)
# outputs
The chairman of European Union leaders, <span style="color:red">Donald Tusk</span>, will meet May in London on Thursday, a day after the bloc’s Brexit negotiator weakened sterling by issuing another warning to Britain, which is due to leave the bloc in March 2019.

# create a new <p> tag in the soup
new_tag = soup.new_tag('p')

# add the modified text to the new tag
# setting a tag’s .string attribute replaces the contents with the new string
new_tag.string = tag_to_modify

# replace the original tag with the new tag
old_tag = original_tag.replace_with(new_tag)

# formatter=None, BeautifulSoup will not modify strings on output
# without this the angle brackets will get turned into “&lt;”, and “&gt;”
print (soup.prettify(formatter=None))
# outputs
<html>
<body>
<p>
This is a test.
</p>
<p>
The chairman of European Union leaders, <span style="color:red">Donald Tusk</span>, will meet May in London on Thursday, a day after the bloc’s Brexit negotiator weakened sterling by issuing another warning to Britain, which is due to leave the bloc in March 2019.
</p>
<p>
This is also a test.
</p>
</body>
</html>

关于python - 用标签替换字符串中的单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55523073/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com