- c - 在位数组中找到第一个零
- linux - Unix 显示有关匹配两种模式之一的文件的信息
- 正则表达式替换多个文件
- linux - 隐藏来自 xtrace 的命令
我最近在 Windows 机器上将 BeautifulSoup 从 3.0 版升级到 4.1 版。
我现在遇到一个奇怪的错误:
File "C:\path\to\myscript.py", line 23
0, in soupify
return BeautifulSoup(html, convertEntities=BeautifulSoup.HTML_ENTITIES)
AttributeError: type object 'BeautifulSoup' has no attribute 'HTML_ENTITIES'
下面是导致抛出异常的代码片段:
def soupify(html):
return BeautifulSoup(html, convertEntities=BeautifulSoup.HTML_ENTITIES)
BS 的文档没有提及构造函数签名如何从 v3 更改为 v4。我该如何解决这个问题?
最佳答案
An incoming HTML or XML entity is always converted into the corresponding Unicode character. Beautiful Soup 3 had a number of overlapping ways of dealing with entities, which have been removed. The BeautifulSoup constructor no longer recognizes the smartQuotesTo or convertEntities arguments. (Unicode, Dammit still has smart_quotes_to, but its default is now to turn smart quotes into Unicode.)
If you want to turn those Unicode characters back into HTML entities on output, rather than turning them into UTF-8 characters, you need to use an output formatter.
来源:http://www.crummy.com/software/BeautifulSoup/bs4/doc/#entities
关于python - BeautifulSoup' 没有属性 'HTML_ENTITIES,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11856011/
这里我有一个文本框,用户可以在其中输入html标签,例如hello然后我将该文本附加到 td 中 var text = $('textbox').val(); $('table').append(
我在这上面花了半天多的时间 - 只是想让电子邮件主题看起来正常。但是当电子邮件到达收件箱时,国际字符显示为 HTML_Entities。 例如:Tydzień o Jedność 显示为:Tydzie
我最近在 Windows 机器上将 BeautifulSoup 从 3.0 版升级到 4.1 版。 我现在遇到一个奇怪的错误: File "C:\path\to\myscript.py", line
我是一名优秀的程序员,十分优秀!