gpt4 book ai didi

python - BeautifulSoup中的unicode函数从什么编码转换而来?

转载 作者:太空宇宙 更新时间:2023-11-03 19:36:48 25 4
gpt4 key购买 nike

当我在 BeautifulSoup 中使用 unicode 函数时 - 它会从什么编码转换为 Unicode?它会自动使用 soup.originalEncoding 吗?

from BeautifulSoup import BeautifulSoup
doc = "<html><h1>Heading</h1><p>Text"
soup = BeautifulSoup(doc)
print unicode(soup)

谢谢

最佳答案

unicode() 是 Python 内置函数,不是 BeautifulSoup 的一部分。请参阅docs here .

unicode([object[, encoding[, errors]]])

If encoding and/or errors are given, unicode() will decode the object which can either be an 8-bit string or a character buffer using the codec for encoding. The encoding parameter is a string giving the name of an encoding; if the encoding is not known, LookupError is raised. Error handling is done according to errors; this specifies the treatment of characters which are invalid in the input encoding. If errors is 'strict' (the default), a ValueError is raised on errors, while a value of 'ignore' causes errors to be silently ignored, and a value of 'replace' causes the official Unicode replacement character, U+FFFD, to be used to replace input characters which cannot be decoded. See also the codecs module.

如果不指定编码,则默认使用sys.getdefaultencoding()

关于python - BeautifulSoup中的unicode函数从什么编码转换而来?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/3192547/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com