Python:解析 thorn 分隔文件 - 代码适用于 Windows，但不适用于 Linux？-6ren

Python:解析 thorn 分隔文件 - 代码适用于 Windows，但不适用于 Linux？

转载作者：太空宇宙更新时间：2023-11-04 09:32:51

25

4

以下代码在 Windows 7 中运行良好:

[30]  delim = b'\xc3\xbe'.decode() # 'þ'
[31]  reader = csv.reader(my_file, delimiter=delim)

但是它在我的 ec2 实例上使用 python 3.4 在 Amazon Linux 上失败，抛出错误:

SyntaxError: Non-UTF-8 code starting with '\xfe' in file data_loader.py on line 30, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

我从 linux shell 运行它，即:

python3 data_loader.py

但是，当我在 ec2 linux 服务器上使用 Python 3.4 命令行时，我得到了预期的结果:

>>> b'\xc3\xbe'.decode()
'þ'

我尝试过为很多东西设置 delim，包括:

delim = '\xfe'

但我得到了同样的错误。

谁能帮我弄清楚这是怎么回事？正如我所说，代码在 Python 3.4、Windows 7 上运行良好。

谢谢!

最佳答案

错误是由于第30行的注释中包含非ascii字符引起的。

根据PEP article该 python 本身将您链接到:

This PEP proposes to introduce a syntax to declare the encoding of a Python source file. The encoding information is then used by the Python parser to interpret the file using the given encoding. Most notably this enhances the interpretation of Unicode literals in the source code and makes it possible to write Unicode literals using e.g. UTF-8 directly in an Unicode aware editor.

...

Python will default to ASCII as standard encoding if no other encoding hints are given.

要修复您的错误，您可以从第 30 行删除注释，或者您可以指定一个文件编码，python 解释器将使用该文件编码来正确读取该注释。

例如，如果您在创建源文件时使用 latin-1 编码来添加 'þ' 字符，则将此行添加到 python 脚本的顶部:

# coding=latin-1

将编码替换为文件的实际编码，您应该可以开始了。

关于Python:解析 thorn 分隔文件 - 代码适用于 Windows，但不适用于 Linux？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29663920/

25

4

0

文章推荐： css - 如何检测 flexbox 中的元素是否被包裹？

文章推荐： python - startswith 函数 - 不正确的争论 - 不抛出错误

文章推荐： CSS:访问或重新加载时页面 'Jumps'

文章推荐： python - django 默认上下文变量

ssis - 如何读取以小写 thorn 作为分隔符的平面文件
我正在尝试在 SSIS 中读取这种格式的平面文件 col1 + col2 + col 3 我正在使用平面文件连接管理器，但在连接管理器的列定界符部分没有“þ”字符选项。解决这个问题的方法是什么？除了
hadoop - Hive 不识别 Thorn 字符定界符
如帖子 Using the Icelandic Thorn character as a delimiter in Hive 中所述Hive 无法识别刺字符定界符示例表如果不存在则创建外部表 zz
r - 在 r 中解析 Thorn "þ"分隔的日志文件
我正在尝试读取以荆棘分隔的 csv 文件(最初是扩展名已更改的日志文件)。然而简单 readLines或 read.delim除了“þ”作为分隔符之外，不要使用默认函数生成一个列式 data.fra
encoding - 在 Hive 中使用 Icelandic Thorn 字符作为分隔符
我目前正在尝试将一些 DoubleClick 广告日志导入 Hadoop。这些日志存储在一个 gzip 分隔文件中，该文件使用第 1252 页(Windows-ANSI？)进行编码，并使用 Icel
Python:解析 thorn 分隔文件 - 代码适用于 Windows，但不适用于 Linux？
以下代码在 Windows 7 中运行良好: [30] delim = b'\xc3\xbe'.decode() # 'þ' [31] reader = csv.reader(my_file, d
sql - 搜索 thorn(字符 254)时，CHARINDEX 在某些 COLLATION 中返回错误结果
概览 CHARINDEX 在使用如下排序顺序时偶尔返回错误值: Latin1_General_CI_AS 但使用如下排序顺序: SQL_Latin1_General_CP1_CI_AS 这在 MS S

首页

博学

6Ren·AI

商城

Python:解析 thorn 分隔文件 - 代码适用于 Windows，但不适用于 Linux？