gpt4 book ai didi

python - BeautifulSoup : Best ways to comment out a tag instead of extracting it?

转载 作者:太空宇宙 更新时间:2023-11-03 16:35:40 27 4
gpt4 key购买 nike

我试图注释掉我稍后想要的 HTML 页面的部分内容,而不是使用漂亮的 soup tag.extract() 函数提取它。例如:

<h1> Name of Article </h2> 
<p>First Paragraph I want</p>
<p>More Html I'm interested in</p>
<h2> Subheading in the article I also want </h2>
<p>Even more Html i want blah blah blah</p>
<h2> References </h2>
<p>Html I want commented out</p>

我希望下面的所有内容(包括引用文献标题)都被注释掉。显然我可以使用 beautiful soup 的提取功能来提取这样的东西:

soup = BeautifulSoup(data, "lxml")

references = soup.find("h2", text=re.compile("References"))
for elm in references.find_next_siblings():
elm.extract()
references.extract()

我也知道 BeautifulSoup 允许评论创建功能,您可以像这样使用

from bs4 import Comment

commented_tag = Comment(chunk_of_html_parsed_somewhere_else)
soup.append(commented_tag)

这看起来非常不Pythonic,并且是一种直接将html注释标签直接封装在特定标签之外的麻烦方法,特别是如果该标签位于厚厚的html树的中间。有没有更简单的方法,您可以在 beautifulsoup 上找到一个标签,然后简单地放置 <!-- -->直接在它之前和之后吗?提前致谢。

最佳答案

假设我正确理解了问题,您可以使用 replace_with()Comment 实例替换标签。这可能是评论现有标签的最简单方法:

import re

from bs4 import BeautifulSoup, Comment

data = """
<div>
<h1> Name of Article </h2>
<p>First Paragraph I want</p>
<p>More Html I'm interested in</p>
<h2> Subheading in the article I also want </h2>
<p>Even more Html i want blah blah blah</p>
<h2> References </h2>
<p>Html I want commented out</p>
</div>"""

soup = BeautifulSoup(data, "lxml")
elm = soup.find("h2", text=re.compile("References"))
elm.replace_with(Comment(str(elm)))

print(soup.prettify())

打印:

<html>
<body>
<div>
<h1>
Name of Article
</h1>
<p>
First Paragraph I want
</p>
<p>
More Html I'm interested in
</p>
<h2>
Subheading in the article I also want
</h2>
<p>
Even more Html i want blah blah blah
</p>
<!--<h2> References </h2>-->
<p>
Html I want commented out
</p>
</div>
</body>
</html>

关于python - BeautifulSoup : Best ways to comment out a tag instead of extracting it?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37234544/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com