gpt4 book ai didi

python - 在 python 中从 xls 读取 unicode

转载 作者:太空宇宙 更新时间:2023-11-03 11:54:09 26 4
gpt4 key购买 nike

我正在尝试使用 Python 读入 .xls 文件。该文件包含多个非 ASCII 字符(即 äöü)。我已经尝试过使用 openpyxls 和 xlrd(我对 xlrd 寄予厚望,因为它应该以 unicode 读取所有内容),但都没有用。

在尝试从 xls 打印信息时,我发现了多个处理编码/解码的答案,但我什至似乎无法做到这一点。该脚本在尝试读取文件后立即出错:

import xlrd
workbook = xlrd.open_workbook('export_data.xls')

导致:

Traceback (most recent call last):
File "C:\Users\Administrator\workspace\tufinderxlstoxml\tufinderxlstoxml2.py", line 2, in <module>
workbook = xlrd.open_workbook('export_data.xls')
File "C:\Python27_32\lib\site-packages\xlrd\__init__.py", line 435, in open_workbook
ragged_rows=ragged_rows,
File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 119, in open_workbook_xls
bk.get_sheets()
File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 705, in get_sheets
self.get_sheet(sheetno)
File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 696, in get_sheet
sh.read(self)
File "C:\Python27_32\lib\site-packages\xlrd\sheet.py", line 796, in read
strg = unpack_string(data, 6, bk.encoding or bk.derive_encoding(), lenlen=2)
File "C:\Python27_32\lib\site-packages\xlrd\biffh.py", line 269, in unpack_string
return unicode(data[pos:pos+nchars], encoding)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position 55: ordinal not in range(128)
WARNING *** OLE2 inconsistency: SSCS size is 0 but SSAT size is non-zero
*** No CODEPAGE record, no encoding_override: will use 'ascii'
*** No CODEPAGE record, no encoding_override: will use 'ascii'

我也试过:

workbook = xlrd.open_workbook('export_data.xls', encoding_override="utf-8")

导致:

Traceback (most recent call last):
File "C:\Users\Administrator\workspace\tufinderxlstoxml\tufinderxlstoxml2.py", line 2, in <module>
workbook = xlrd.open_workbook('export_data.xls', encoding_override="utf-8")
File "C:\Python27_32\lib\site-packages\xlrd\__init__.py", line 435, in open_workbook
ragged_rows=ragged_rows,
File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 119, in open_workbook_xls
bk.get_sheets()
File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 705, in get_sheets
self.get_sheet(sheetno)
File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 696, in get_sheet
sh.read(self)
File "C:\Python27_32\lib\site-packages\xlrd\sheet.py", line 796, in read
strg = unpack_string(data, 6, bk.encoding or bk.derive_encoding(), lenlen=2)
File "C:\Python27_32\lib\site-packages\xlrd\biffh.py", line 269, in unpack_string
return unicode(data[pos:pos+nchars], encoding)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 55: invalid start byte
WARNING *** OLE2 inconsistency: SSCS size is 0 but SSAT size is non-zero

并在顶部包括各种版本:

# -*- coding: utf-8 -*-

我在 Windows Server 2008 机器上的 python 2.7 上运行它。

最佳答案

感谢大家的反馈!

我最终使用 encoding_override 函数修复了它。我找不到 cp 代码对应德语字符的 Microsoft 文档,所以我尝试了所有这些文档。最终我到达了 cp1251 并且它成功了!

workbook = xlrd.open_workbook(path, encoding_override="cp1251")

关于python - 在 python 中从 xls 读取 unicode,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17193997/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com