python - 写入文件时出现 UnicodeEncodeError-6ren

python - 写入文件时出现 UnicodeEncodeError

转载作者：太空狗更新时间：2023-10-29 12:29:04

28

4

我有一个 python 脚本，在我的本地机器 (OS X) 上运行良好，但是当我将它复制到服务器 (Debian) 时，它无法按预期运行。该脚本读取 xml 文件并以新格式打印内容。在我的本地机器上，我可以使用 stdout 将脚本运行到终端或文件(即 > myFile.txt )，两者都可以正常工作。

但是，在服务器(ssh)上，当我打印到终端时一切正常，但是打印到文件(这是我真正需要的)会出现 UnicodeEncodeError: UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128) .所有文件都是utf-8编码，在魔术注释中声明为utf-8。

如果我打印 str列表中的对象(这是我通常用来处理编码问题的技巧)，它也会引发相同的错误。

如果我使用 print( x.encode('utf-8') ) ，然后打印代码样式位(例如 b'1' b'\xd0\x9a\xd0\xb0\xd0\xbc\xd0\xb0' )。

如果我$ export PYTHONIOENCODING=utf-8在 shell 中(如一些 SO 帖子中所建议的)，然后我得到一个二进制文件:1 <D0><9A><D0><B0><D0><BC><D0><B0> .

我检查了所有 locale变量和相关变量与我在本地计算机上的变量相匹配。

我可以简单地在本地处理文件并上传，但我真的很想了解这里发生了什么。由于 python 代码在一台计算机上运行，我不确定它是否相关，但我在下面添加它:

# -*- encoding: utf-8 -*-
import sys, xml.etree.ElementTree as ET

corpus = ET.parse('file.xml')
text = corpus.getroot()
for body in text :
  for sent in body :
    depDOMs = [(0,'') for i in range(len(sent)+1)]
    for word in sent :
      if word.tag == 'LF' :
        pass
      elif 'ID' in word.attrib and 'FEAT' in word.attrib and 'DOM' in word.attrib :
        ID = word.attrib['ID']
        try :
          Form =  word.text.replace(' ','_')
        except AttributeError :
          Form = '_'
        try :
          Lemma =  word.attrib['LEMMA'].replace(' ', '_')
        except KeyError :
          Lemma = '*NULL*'
        CPOS = word.attrib['FEAT'].split()[0]
        POS = word.attrib['FEAT'].replace( ' ' , '_' )
        Feats = '_'
        Head = word.attrib['DOM']
        if Head == '_root' :
          Head = '0'
        try :
          DepRel = word.attrib['LINK']
        except KeyError :
          DepRel = 'ROOT'
        PHead = '_'
        PDepRel = '_'
        try:
          if word.attrib['NODETYPE'] == 'FANTOM' :
            word.attrib['LEMMA'] = '*'+word.attrib['LEMMA']+'*'
        except KeyError :
          pass
        print( ID , Form , Lemma , Feats, CPOS , POS , Head , DepRel , PHead , PDepRel , sep='\t' )
      else :
        print( 'WARNING: what is this?',sent.attrib['ID'],word.attrib)
  print()

最佳答案

潜在的问题可能是由于 Linux 的语言环境配置错误引起的，这意味着 Python 在打印非 ASCII 字符时过于谨慎。

使用locale 确认语言环境配置。如果出现问题，您会看到类似以下内容:

$ locale 
locale: Cannot set LC_CTYPE to default locale: No such file or directory 
locale: Cannot set LC_ALL to default locale: No such file or directory 
LANG=en_US.UTF-8 
LANGUAGE=

解决这个问题:

$ sudo locale-gen "en_US.UTF-8"

(将“en_US.UTF-8”替换为无效的语言环境)。有关详细信息，请参阅:https://askubuntu.com/questions/162391/how-do-i-fix-my-locale-issue

关于python - 写入文件时出现 UnicodeEncodeError，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/32924147/

28

4

0

文章推荐： css - 数据 uri :s 的 MSHTML 回退

文章推荐： html - 在django模板中将字符串转换为html代码

django - 如何修复异常类型 : UnicodeEncodeError
我不确定为什么会收到此错误: Exception Type: UnicodeEncodeError Unicode error hint The string that could not be en
python - 搜索中文文本会抛出 UnicodeEncodeError
我正在使用python-twiter使用 Twitter 的 API 搜索推文，但我遇到中文术语问题。这是重现该问题的最小代码示例: # -*- coding: utf-8 -*- import tw
Python/Tweepy UnicodeEncodeError
我正在尝试使用 Twitter API 和 Python 来浏览 Twitter BIOS。但是我收到此错误: newFile.writerow(info) UnicodeEncodeError:
Python 网站抓取工具 UnicodeEncodeError
我正在使用 Requests 和 BeautifulSoup 以及 Python 3.4 从网站上抓取可能包含也可能不包含日语或其他特殊字符的信息。 def startThisPage(url):
python3 记录器 - UnicodeEncodeError
我有一个这样的记录器设置: import logging from logging.handlers import RotatingFileHandler import sys # root logg
python - UnicodeEncodeError 并将数据插入数据库
我有一个 Python 抓取器，它抓取一个网站并将数据插入 MySql 数据库。突然间我得到了一个错误 UnicodeEncodeError: 'latin-1' codec can't encode
Python:无法写入文件 - UnicodeEncodeError
此代码应将一些文本写入文件。当我尝试将文本写入控制台时，一切正常。但是当我尝试将文本写入文件时，出现 UnicodeEncodeError。我知道，这是一个常见问题，可以使用适当的解码或编码来解决，但
python - 修复由智能引号引起的 UnicodeEncodeError
我正在从事一个涉及自动生成文档(通过 latex )的项目。创建这些文档的人在 Windows 机器上工作(他以前使用 Microsoft word，但现在他在记事本中编辑它们)。无论如何，我注意到有
python - UnicodeEncodeError Python
当我尝试在 UTF-8 字符串中查找单词的计数时，我得到了下一个: UnicodeEncodeError UnicodeEncodeError: 'ascii' codec can't encode
Python 统一码 UnicodeEncodeError
我在尝试将 UTF-8 字符串转换为 unicode 时遇到问题。我收到错误。 UnicodeEncodeError: 'ascii' codec can't encode characters in
Python UnicodeEncodeError/维基百科API
我正在尝试用 Python 和 BeautifulSoup 解析这个文档: http://en.wikipedia.org/w/api.php?format=xml&action=opensearch
python - UnicodeEncodeError，需要修复
我正在尝试使用简单的 python print 语句。 print('这是') 但我遇到了这些问题。我正在使用Windows。原子IDE。 python 3.6.5问候，巴努。最佳答案将 # -
python - 保存到文件时出现 UnicodeEncodeError
无论我尝试什么解码和编码，我似乎都无法让它工作。我目前收到错误: UnicodeEncodeError: 'ascii' 编解码器无法对字符 u'\u2013' 进行编码但是如果我要添加解码和编码，
python - Django - UnicodeEncodeError
这个问题已经有答案了: Python: Unicode and ElementTree.parse (3 个回答) 已关闭 7 年前。在我的 Django 应用程序中，我使用 suds 库发出了肥皂
python - 读取文件时出现 UnicodeEncodeError
我正在尝试从 rockyou 单词列表中读取内容并将所有 >= 8 个字符的单词写入新文件。这是代码 - def main(): with open("rockyou.txt", encod
Python 写入文件时出现 UnicodeEncodeError
我正在使用“pdfminer.six”(一个 Python 库)从我拥有的几个 PDF 中提取所有文本。我的方法工作完美，但对于某些 pdf，可能有一些特殊字符，当我将其写入文本文件时，我收到“Uni
python - 如何重现 UnicodeEncodeError？
我在生产系统中遇到错误，但我无法在开发环境中重现该错误: with io.open(file_name, 'wt') as fd: fd.write(data) 异常(exception):
Python:从标准输入读取时出现 UnicodeEncodeError
当运行从标准输入读取的 Python 程序时，出现以下错误: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
python - 加入文件名时出现 UnicodeEncodeError
它在执行以下代码时抛出“UnicodeDecodeError:‘ascii’编解码器无法解码位置 2 中的字节 0xc2:序号不在范围内(128)”: filename = 'Spywaj.ttf'
python - 写入文件时出现 UnicodeEncodeError
我有一个 python 脚本，在我的本地机器 (OS X) 上运行良好，但是当我将它复制到服务器 (Debian) 时，它无法按预期运行。该脚本读取 xml 文件并以新格式打印内容。在我的本地机器上，

首页

博学

6Ren·AI

商城

python - 写入文件时出现 UnicodeEncodeError