gpt4 book ai didi

python json转储unicode错误

转载 作者:太空宇宙 更新时间:2023-11-04 03:20:43 25 4
gpt4 key购买 nike

我正在尝试将字典存储为具有 utf-8 编码的 json 文档,但我似乎做错了什么,无法弄清楚是什么。我在下面发布了堆栈跟踪和函数。

def parse_contents(res_dict, file):

content_payload = res_dict['parse']['wikitext']['*']
sections_payload = res_dict['parse']['sections']
db = {}
#parse_captures = ("Owner", "Description", "Usage", "Examples", "Options", "Misc.")

def now_next_iter(iterable):
import itertools
a, b = itertools.tee(sections_payload)
next(b, None)
return itertools.izip(a, b)

def remove_tags(text):
import re
return re.sub('<[^<]+?>', '', text)

for cur, nxt in now_next_iter(sections_payload):

if cur['toclevel'] == 2:
head = cur['line']
db[head] = {}
elif cur['toclevel'] == 3:
line = cur['line']
ibo = cur['byteoffset']
fbo = nxt['byteoffset']

content = remove_tags(content_payload[ibo:fbo])
db[head][line] = content #.encode('utf-8')

with io.open(file, 'w', encoding='utf8') as json_db:
s = json.dumps( db, sort_keys=True, indent=4,
separators=(',', ': '))
json_db.write(s.encode('utf-8'))

enter image description here

尝试 1:

将打印文件更改为:

    with io.open(file, 'w', encoding='utf8') as json_db:
s = json.dumps( db, sort_keys=True, indent=4,
ensure_ascii=False, encoding='UTF8', separators=(',', ': '))
s = s.encode('utf-8')
json_db.write(s)

输出:这令人困惑,因为我认为 s.encode('utf-8') 应该将其更改为 unicode。 enter image description here

最佳答案

您可能需要设置 json.dumps 可选参数 'ensure_ascii=False',和/或在 json.dumps 中设置 encoding='UTF8',而不仅仅是 file.open() 调用,这将允许 json 包使用它的选项来处理非 ASCII 数据。

请参阅此处的文档:https://docs.python.org/2/library/json.html

关于python json转储unicode错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34816906/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com