gpt4 book ai didi

python - BeautifulSoup 内部html?

转载 作者:IT老高 更新时间:2023-10-28 21:45:07 29 4
gpt4 key购买 nike

假设我有一个带有 div 的页面。我可以使用 soup.find() 轻松获得该 div。

现在我有了结果,我想打印那个 div 的整个 innerhtml:我的意思是,我需要一个包含所有 html 的字符串标签和文本放在一起,就像我在 javascript 中使用 obj.innerHTML 得到的字符串一样。这可能吗?

最佳答案

TL;DR

对于 BeautifulSoup 4,如果您想要一个 UTF-8 编码的字节字符串,请使用 element.encode_contents(),如果您想要 Python Unicode 字符串,请使用 element.decode_contents()。例如 DOM's innerHTML method可能看起来像这样:

def innerHTML(element):
"""Returns the inner HTML of an element as a UTF-8 encoded bytestring"""
return element.encode_contents()

这些函数目前不在在线文档中,因此我将引用当前函数定义和代码中的文档字符串。

encode_contents - 从 4.0.4 开始

def encode_contents(
self, indent_level=None, encoding=DEFAULT_OUTPUT_ENCODING,
formatter="minimal"):
"""Renders the contents of this tag as a bytestring.

:param indent_level: Each line of the rendering will be
indented this many spaces.

:param encoding: The bytestring will be in this encoding.

:param formatter: The output formatter responsible for converting
entities to Unicode characters.
"""

另见documentation on formatters ;除非您想手动处理文本,否则您很可能会使用 formatter="minimal" (默认)或 formatter="html" (用于 html entities )以某种方式。

encode_contents 返回一个编码的字节串。如果您想要 Python Unicode 字符串,请改用 decode_contents


decode_contents - 从 4.0.1 开始

decode_contentsencode_contents 做同样的事情,但返回 Python Unicode 字符串而不是编码的字节串。

def decode_contents(self, indent_level=None,
eventual_encoding=DEFAULT_OUTPUT_ENCODING,
formatter="minimal"):
"""Renders the contents of this tag as a Unicode string.

:param indent_level: Each line of the rendering will be
indented this many spaces.

:param eventual_encoding: The tag is destined to be
encoded into this encoding. This method is _not_
responsible for performing that encoding. This information
is passed in so that it can be substituted in if the
document contains a <META> tag that mentions the document's
encoding.

:param formatter: The output formatter responsible for converting
entities to Unicode characters.
"""

美汤3

BeautifulSoup 3 没有上述功能,而是有 renderContents

def renderContents(self, encoding=DEFAULT_OUTPUT_ENCODING,
prettyPrint=False, indentLevel=0):
"""Renders the contents of this tag as a string in the given
encoding. If encoding is None, returns a Unicode string.."""

此功能已添加回 BeautifulSoup 4 (in 4.0.4) 以与 BS3 兼容。

关于python - BeautifulSoup 内部html?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8112922/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com