python - BeautifulSoup : Best ways to comment out a tag instead of extracting it?-6ren

python - BeautifulSoup : Best ways to comment out a tag instead of extracting it?

转载作者：太空宇宙更新时间：2023-11-03 16:35:40

27

4

我试图注释掉我稍后想要的 HTML 页面的部分内容，而不是使用漂亮的 soup tag.extract() 函数提取它。例如:

<h1> Name of Article </h2> 
<p>First Paragraph I want</p>
<p>More Html I'm interested in</p>
<h2> Subheading in the article I also want </h2>
<p>Even more Html i want blah blah blah</p>
<h2> References </h2> 
<p>Html I want commented out</p>

我希望下面的所有内容(包括引用文献标题)都被注释掉。显然我可以使用 beautiful soup 的提取功能来提取这样的东西:

soup = BeautifulSoup(data, "lxml")

references = soup.find("h2", text=re.compile("References"))
for elm in references.find_next_siblings():
    elm.extract()
references.extract()

我也知道 BeautifulSoup 允许评论创建功能，您可以像这样使用

from bs4 import Comment

commented_tag = Comment(chunk_of_html_parsed_somewhere_else)
soup.append(commented_tag)

这看起来非常不Pythonic，并且是一种直接将html注释标签直接封装在特定标签之外的麻烦方法，特别是如果该标签位于厚厚的html树的中间。有没有更简单的方法，您可以在 beautifulsoup 上找到一个标签，然后简单地放置 直接在它之前和之后吗？提前致谢。

最佳答案

假设我正确理解了问题，您可以使用 replace_with()用 Comment 实例替换标签。这可能是评论现有标签的最简单方法:

import re

from bs4 import BeautifulSoup, Comment

data = """
<div>
    <h1> Name of Article </h2>
    <p>First Paragraph I want</p>
    <p>More Html I'm interested in</p>
    <h2> Subheading in the article I also want </h2>
    <p>Even more Html i want blah blah blah</p>
    <h2> References </h2>
    <p>Html I want commented out</p>
</div>"""

soup = BeautifulSoup(data, "lxml")
elm = soup.find("h2", text=re.compile("References"))
elm.replace_with(Comment(str(elm)))

print(soup.prettify())

打印:

<html>
 <body>
  <div>
   <h1>
    Name of Article
   </h1>
   <p>
    First Paragraph I want
   </p>
   <p>
    More Html I'm interested in
   </p>
   <h2>
    Subheading in the article I also want
   </h2>
   <p>
    Even more Html i want blah blah blah
   </p>
   <!--<h2> References </h2>-->
   <p>
    Html I want commented out
   </p>
  </div>
 </body>
</html>

关于python - BeautifulSoup : Best ways to comment out a tag instead of extracting it?，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37234544/

27

4

0

文章推荐： python - Instancemethod 对象不可迭代 (AirPi) (Python)

文章推荐： ruby - 从散列中构建散列

文章推荐： python - Fabric.api 执行陷入循环

xml - XML 中的和之间是否存在语义差异？
我们与一位客户存在某种问题，该客户认为我们发送的 XML 文件中的两个版本的空标记之间存在语义差异(纯 XML 没有 HTML ..)。他们期望: 我们发送: 我
java - 如何在openNLP chunker中识别PP-tags/NP-tags/VP-tags？
我想计算文本中 pp/np/vp 的数量，但我不知道如何在 openNLP chunker 中识别 PP-tags/NP-tags/VP-tags？我已经尝试过这段代码，但它不起作用。 Chunker
jquery - $ ("") 和 $ ('' ) 之间有什么区别？
从我正在阅读的代码的上下文来看，它看起来像 $("")创建一个标签，其中 $('')是一个搜索标签的选择器。这里发生了什么？实际上，我可能没有掌握第二个语法，但我确信我已经完成了 $('idName'
ruby - 如何让 Builder 创建而不是
我正在使用 Builder::XmlMarkup 创建 xml。我想创建一个没有内容的标签，因为 api 强制我创建它。如果我使用博客 xml.tag do end 我得到了我想要的但我希望它更短
html - HTML 中的和有什么区别？
这个问题在这里已经有了答案: 关闭 10 年前。 Possible Duplicate: Using the XHTML closing slash (/) on normal tags? Are
.net - 我们可以强制 XmlWriter 发出而不是吗？
默认情况下， someXmlWriter.WriteElementString("my-tag", someString); 产生我环顾四周XmlWriterSettings强制作者生成的可能选
jquery - 如何修改jquery tag-it插件: limit number of tags and only allow available tags
如何修改tag-it ui插件https://github.com/aehlke/tag-it (版本 v2.0)因此它只允许选择 x 个标签，以及如何仅允许“availableTags-option
Java从XML解析值
我能够解析这样的内容: value 通过: File inputFile = new File("input.xml"); DocumentBuilderFactory dbFactory = Doc
MySql查询: select a list of tag names and for each tag get the most recently tagged albums information
我不太确定如何编写这个查询，它可以在一个查询中完成。案例如下: 我需要选择标签名称列表，并为每个标签获取最近标记的专辑信息。这意味着，如果用户创建名为“Pamela Anderson”的专辑并将该专辑
javascript - HTML5 : Why does a script tag need to be placed at the end of body tag instead of at the beginning of the body tag?
这个问题在这里已经有了答案: Where should I put tags in HTML markup? (21 个回答) JavaScript at bottom/top of web pa
Django 标签 : why annotate(same_tags=Count ('tags' )) counts the number of common tags instead of the total number of tags?
Django 2 by Example 中的教程，我不明白: step (2): Why is `Count('tags')` **not** counting the total number of
tags - jekyll - 列出 page.tags
我是 jekyll 的新手，正在构建我的网站。我有一个“帖子”布局，我希望与帖子相关的所有标签都出现在左栏中。我遇到的问题是，使用 {{ page.tags }} 会返回一个未以逗号分隔且看起来很乱
apache 将一个目录下的所有hash tag 改写为slash tag
如何将一个目录下的所有hash tag重写为slash tag？ ( Apache ) http://www.domain.com/company/index#about => http://www.
tags - Flickr API : What are the "vision" tags?
在查询 Flickr API 并检查返回的标签时，我注意到我收到了未在 Web 界面上显示的其他标签。例如对于此图像: http://www.flickr.com/photos/77060598@N0
php - PHP中只替换和之间的特定字符
我有类似的东西我想得到这个: <1> <2> 但我只想在中应用它标签而不是其他任何地方。我已经有了这个: $txt = $this->input->post('
php - 如何删除以开头并以结尾的内容
我想删除 xxx yyyy zzz 用 php。但是，首先，我想控制字符串是否以开头并以结尾是否有用于此目的的函数？ if(string begins with '' and ends wi
Django : 'tag' is not a registered tag library error
在我的模板中加载自定义标签时出现此错误。我访问了许多关于此的主题，并且确保确认我没有犯一些常见错误: 包含标签的文件在 templatetags 中文件夹。此 templatetags文件夹包含 _
svg - SvgElement.tag(String tag)构造函数的用途是什么？
API doc中没有关于构造函数的文档。我想了解SvgElement.tag()的用途/用例。最佳答案 SvgElement.tag(String tag)构造函数为对应的SvgElement值创建
jQuery:如何返回 .data ("tag","tagged")的所有元素？
$('*').data('tag', "tagged"); $('li[tag=tagged]').length 返回零... 最佳答案 $('*').data('tag', "tagged"); $
Django : Is it impossible to static tag into block tag?
下面的代码出错了。我该如何解决这个问题？ {% block header %} {% endblock %} 错误输出: TemplateSyntaxError : Invalid bloc

首页

博学

6Ren·AI

商城

python - BeautifulSoup : Best ways to comment out a tag instead of extracting it?