gpt4 book ai didi

python - 在python中解码html编码的字符串

转载 作者:太空宇宙 更新时间:2023-11-04 11:06:00 27 4
gpt4 key购买 nike

我有以下字符串...

"Scam, hoax, or the real deal, he’s gonna work his way to the bottom of the sordid tale, and hopefully end up with an arcade game in the process."

我需要把它变成这个字符串...

Scam, hoax, or the real deal, he’s gonna work his way to the bottom of the sordid tale, and hopefully end up with an arcade game in the process.

这是非常标准的 HTML 编码,我一辈子都想不出如何在 python 中转换它。

我发现了这个: GitHub

它非常接近工作,但是它不输出撇号而是输出一些非 unicode 字符。

这是 GitHub 脚本的输出示例...

Scam, hoax, or the real deal, heâs gonna work his way to the bottom of the sordid tale, and hopefully end up with an arcade game in the process.

最佳答案

您正在尝试做的事情称为“HTML 实体解码”,它在过去的许多 Stack Overflow 问题中都有涉及,例如:

这是使用 Beautiful Soup 的代码片段用于解码您的示例的 HTML 解析库:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from BeautifulSoup import BeautifulSoup

string = "Scam, hoax, or the real deal, he’s gonna work his way to the bottom of the sordid tale, and hopefully end up with an arcade game in the process."
s = BeautifulSoup(string,convertEntities=BeautifulSoup.HTML_ENTITIES).contents[0]
print s

这是输出:

Scam, hoax, or the real deal, he’s gonna work his way to the bottom of the sordid tale, and hopefully end up with an arcade game in the process.

关于python - 在python中解码html编码的字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/913933/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com