gpt4 book ai didi

python在html中显示unicode

转载 作者:太空狗 更新时间:2023-10-30 01:30:25 24 4
gpt4 key购买 nike

我正在编写脚本以将我的链接及其标题从 chrome 导出到 html。
Chrome 书签存储为 json,以 utf 编码
有些标题是俄语的,因此它们是这样存储的:
“名称”:“\u0425\u0430\u0431\u0440\...”

import codecs
f = codecs.open("chrome.json","r", "utf-8")
data = f.readlines()

urls = [] # for links
names = [] # for link titles

ind = 0

for i in data:
if i.find('"url":') != -1:
urls.append(i.split('"')[3])
names.append(data[ind-2].split('"')[3])
ind += 1

fw = codecs.open("chrome.html","w","utf-8")
fw.write("<html><body>\n")
for n in names:
fw.write(n + '<br>')
# print type(n) # this will return <type 'unicode'> for each url!
fw.write("</body></html>")

现在,在 chrome.html 中,我将它们显示为\u0425\u0430\u0431...
我怎样才能把它们变回俄语?
使用 python 2.5

**编辑:已解决!**

s = '\u041f\u0440\u0438\u0432\u0435\u0442 world!'
type(s)
<type 'str'>

print s.decode('raw-unicode-escape').encode('utf-8')
Привет world!

这就是我需要的,将\u041f... 的 str 转换为 unicode

f = open("chrome.json", "r")
data = f.readlines()
f.close()

urls = [] # for links
names = [] # for link titles

ind = 0

for i in data:
if i.find('"url":') != -1:
urls.append(i.split('"')[3])
names.append(data[ind-2].split('"')[3])
ind += 1

fw = open("chrome.html","w")
fw.write("<html><body>\n")
for n in names:
fw.write(n.decode('raw-unicode-escape').encode('utf-8') + '<br>')
fw.write("</body></html>")

最佳答案

顺便说一下,这不仅仅是俄语;非 ASCII 字符在页面名称中很常见。示例:

name=u'Python Programming Language \u2013 Official Website'
url=u'http://www.python.org/'

作为像这样的脆弱代码的替代品

urls.append(i.split('"')[3])
names.append(data[ind-2].split('"')[3])
# (1) relies on name being 2 lines before url
# (2) fails if there is a `"` in the name
# example: "name": "The \"Fubar\" website",

您可以使用 json 模块处理输入文件。对于 Python 2.5,您可以获得 simplejson .

这是一个模拟您脚本的脚本:

try:
import json
except ImportError:
import simplejson as json
import sys

def convert_file(infname, outfname):

def explore(folder_name, folder_info):
for child_dict in folder_info['children']:
ctype = child_dict.get('type')
name = child_dict.get('name')
if ctype == 'url':
url = child_dict.get('url')
# print "name=%r url=%r" % (name, url)
fw.write(name.encode('utf-8') + '<br>\n')
elif ctype == 'folder':
explore(name, child_dict)
else:
print "*** Unexpected ctype=%r ***" % ctype

f = open(infname, 'rb')
bmarks = json.load(f)
f.close()
fw = open(outfname, 'w')
fw.write("<html><body>\n")
for folder_name, folder_info in bmarks['roots'].iteritems():
explore(folder_name, folder_info)
fw.write("</body></html>")
fw.close()

if __name__ == "__main__":
convert_file(sys.argv[1], sys.argv[2])

在 Windows 7 Pro 上使用 Python 2.5.4 测试。

关于python在html中显示unicode,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5127855/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com