gpt4 book ai didi

python - 如何使用 python 修复字符串中不正确的 html 标签?

转载 作者:行者123 更新时间:2023-12-02 22:48:15 25 4
gpt4 key购买 nike

所以我使用 python 从 openai 的 API 生成包含 HTML 标签的文章。文章很长,大多数情况下我都能得到正确的结果,但有时 HTML 标签不正确,下面是一个示例:

<h3><strong>1. Gaze:</strong></h 3 >
<p><strong>Gaze</ strong>. is a free and easy-to-use video streaming app , supporting both live and pre-recorded content. It supports up to 10 people joining a single session at once, with synchronized video playback for all users. Additionally, Gaze offers its own messaging service so you can chat during the viewing experience.</p>

<h3><strong>2. Chrono:</strong></h 3 >

如何修复这些 HTML 标记?我已经使用过 bs4,但它在不同的行上分隔标签,这不是我想要的。

使用 python 还有其他解决方案吗?

我尝试过 bs4 但没有得到好的结果...

最佳答案

由于结束标记中存在多余空格,您问题中的 HTML 示例需要更正。您只需删除所有空格即可修复这些格式错误的结束标签。这是一个例子:

import re

def remove_spaces_from_closing_tags(html):
fixed_html = ""
# Regular Expression sets apart tags and other content
for tag, other_content in re.findall(r'(<[^>]*>)|([^<]*)', html):
if tag:
# If it is a closing tag then remove spaces, otherwise leave it as is
fixed_html += re.sub(r'\s+', r'', tag) if '/' in tag else tag
if other_content:
# Leave other content as is
fixed_html += other_content
return fixed_html



# Input malformed HTML
html = """
<h3><strong>1. Gaze:</strong></h 3 >
<p><strong>Gaze</ strong>. is a free and easy-to-use video streaming app , supporting both live and pre-recorded content. It supports up to 10 people joining a single session at once, with synchronized video playback for all users. Additionally, Gaze offers its own messaging service so you can chat during the viewing experience.</p>

<h3><strong>2. Chrono:< / st rong ></h 3 >
"""

print(remove_spaces_from_closing_tags(html))

此示例代码将输出:

<h3><strong>1. Gaze:</strong></h3>
<p><strong>Gaze</strong>. is a free and easy-to-use video streaming app , supporting both live and pre-recorded content. It supports up to 10 people joining a single session at once, with synchronized video playback for all users. Additionally, Gaze offers its own messaging service so you can chat during the viewing experience.</p>

<h3><strong>2. Chrono:</strong></h3>

您可以使用 remove_spaces_from_closing_tags 修复格式错误的结束标记上面定义的函数。您的示例没有显示任何格式错误的开始标记,但请记住,如果您也有格式错误的开始标记,则不能对格式错误的开始标记使用相同的方法。例如,删除格式错误的开始标记中的所有空格,例如 <h 3 class="some-class"> ,不会修复它。因此,remove_spaces_from_closing_tags函数仅修复带有额外空格的结束标记,类似于示例 HTML 中的标记。

关于python - 如何使用 python 修复字符串中不正确的 html 标签?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75712694/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com