gpt4 book ai didi

Python将二进制文件转换为字符串同时忽略非ascii字符

转载 作者:行者123 更新时间:2023-11-28 21:15:29 26 4
gpt4 key购买 nike

我有一个二进制文件,我想提取所有 ascii 字符,同时忽略非 ascii 字符。目前我有:

with open(filename, 'rb') as fobj:
text = fobj.read().decode('utf-16-le')
file = open("text.txt", "w")
file.write("{}".format(text))
file.close

但是,我在写入文件时遇到错误 UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 0: ordinal not in range(128)。我如何让 Python 忽略非 ascii?

最佳答案

使用内置的 ASCII 编解码器并告诉它忽略任何错误,例如:

with open(filename, 'rb') as fobj:
text = fobj.read().decode('utf-16-le')
file = open("text.txt", "w")
file.write("{}".format(text.encode('ascii', 'ignore')))
file.close()

您可以在 Python 解释器中测试和使用它:

>>> s = u'hello \u00a0 there'
>>> s
u'hello \xa0 there'

只是尝试转换为字符串会引发异常。

>>> str(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 6: ordinal not in range(128)

...就像尝试将该 unicode 字符串编码为 ASCII 一样:

>>> s.encode('ascii')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 6: ordinal not in range(128)

...但是告诉编解码器忽略它无法处理的字符没问题:

>>> s.encode('ascii', 'ignore')
'hello there'

关于Python将二进制文件转换为字符串同时忽略非ascii字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30124649/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com