gpt4 book ai didi

python - DictReader 和 UnicodeError

转载 作者:太空宇宙 更新时间:2023-11-04 10:42:24 24 4
gpt4 key购买 nike

def openFile(fileName):
try:
trainFile = io.open(fileName,"r",encoding = "utf-8")
except IOError as e:
print ("File could not be opened: {}".format(e))
else:
trainData = csv.DictReader(trainFile)
print trainData
return trainData

def computeTFIDF(trainData):
bodyList = []
print "Inside computeTFIDF"
for row in trainData:
for key, value in row.iteritems():
print key, unicode(value, "utf-8", "ignore")
print "Done"
return

if __name__ == "__main__":
print "Main"
trainData = openFile("../Data/TrainSample.csv")
print "File Opened"
computeTFIDF(trainData)

错误:

Traceback (most recent call last):
File "C:\DebSeal\IUB MS Program\IUB Sem III\Facebook Kaggle Comp\Src\facebookChallenge.py", line 62, in <module>
computeTFIDF(trainData)
File "C:\DebSeal\IUB MS Program\IUB Sem III\Facebook Kaggle Comp\Src\facebookChallenge.py", line 42, in computeTFIDF
for row in trainData:
File "C:\Python27\lib\csv.py", line 104, in next
row = self.reader.next()
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 215: ordinal not in range(128)

TrainSample.csv:是一个包含 4 列(带标题)的 csv 文件。
操作系统:Windows 7 64 位。
使用 Python 2.x

我不知道这里出了什么问题。我说它忽略编码。但仍然会抛出相同的错误。

我认为在控件到达编码之前,它会抛出一个错误。

谁能告诉我哪里出错了。

最佳答案

Python 2 CSV 模块处理 Unicode 输入。

以二进制模式打开文件,将其解析为 CSV 后进行解码。这对于 UTF-8 编解码器是安全的,因为换行符、定界符和引号都编码为 1 个字节。

csv 模块文档在 example section 中包含一个 UnicodeReader 包装器类那将为您解码;它很容易适应 DictReader 类:

import csv

class UnicodeDictReader:
"""
A CSV reader which will iterate over lines in the CSV file "f",
which is encoded in the given encoding.
"""

def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
self.encoding = encoding
self.reader = csv.DictReader(f, dialect=dialect, **kwds)

def next(self):
row = self.reader.next()
return {k: unicode(v, "utf-8") for k, v in row.iteritems()}

def __iter__(self):
return self

将其用于以二进制模式打开的文件:

def openFile(fileName):
try:
trainFile = open(fileName, "rb")
except IOError as e:
print "File could not be opened: {}".format(e)
else:
return UnicodeDictReader(trainFile)

关于python - DictReader 和 UnicodeError,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19740385/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com