gpt4 book ai didi

python - 使用 ElementTree 编写带有 utf-8 数据的 xml utf-8 文件

转载 作者:太空狗 更新时间:2023-10-29 17:58:54 24 4
gpt4 key购买 nike

我正在尝试使用 ElementTree 编写一个包含 utf-8 编码数据的 xml 文件,如下所示:

#!/usr/bin/python                                                                       
# -*- coding: utf-8 -*-

import xml.etree.ElementTree as ET
import codecs

testtag = ET.Element('unicodetag')
testtag.text = u'Töreboda' #The o is really ö (o with two dots over). No idea why SO dont display this
expfile = codecs.open('testunicode.xml',"w","utf-8-sig")
ET.ElementTree(testtag).write(expfile,encoding="UTF-8",xml_declaration=True)
expfile.close()

这会导致错误

Traceback (most recent call last):
File "unicodetest.py", line 10, in <module>
ET.ElementTree(testtag).write(expfile,encoding="UTF-8",xml_declaration=True)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 815, in write
serialize(write, self._root, encoding, qnames, namespaces)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 932, in _serialize_xml
write(_escape_cdata(text, encoding))
File "/usr/lib/python2.7/codecs.py", line 691, in write
return self.writer.write(data)
File "/usr/lib/python2.7/codecs.py", line 351, in write
data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

改用“us-ascii”编码效果很好,但不要保留数据中的 unicode 字符。发生了什么事?

最佳答案

codecs.open 期望将 Unicode 字符串写入文件对象,并将处理编码为 UTF-8。 ElementTree 的 write 在将它们发送到文件对象之前将 Unicode 字符串编码为 UTF-8 字节字符串。由于文件对象需要 Unicode 字符串,它使用默认的 ascii 编解码器将字节字符串强制回 Unicode,并导致 UnicodeDecodeError

只需这样做:

#expfile = codecs.open('testunicode.xml',"w","utf-8-sig")
ET.ElementTree(testtag).write('testunicode.xml',encoding="UTF-8",xml_declaration=True)
#expfile.close()

关于python - 使用 ElementTree 编写带有 utf-8 数据的 xml utf-8 文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10046755/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com