gpt4 book ai didi

python - 当我尝试删除 Python 字符串中的重音符号时,如何修复出现的 UnicodeDecodeError?

转载 作者:太空宇宙 更新时间:2023-11-04 07:23:10 24 4
gpt4 key购买 nike

我正在尝试使用此功能:

import unicodedata

def remove_accents(input_str):
nkfd_form = unicodedata.normalize('NFKD', unicode(input_str))
return u"".join([c for c in nkfd_form if not unicodedata.combining(c)])

在下面的代码中(解压缩并读取带有非 ASCII 字符串的文件)。但是我收到了这个错误,(来自这个库文件 C:\Python27\Lib\encodings\utf_8.py):

Message File Name   Line    Position    
Traceback
<module> C:\Users\CG\Desktop\Google Drive\Sci&Tech\projects\naivebayes\USSSALoader.py 64
getNameList C:\Users\CG\Desktop\Google Drive\Sci&Tech\projects\naivebayes\USSSALoader.py 26
remove_accents C:\Users\CG\Desktop\Google Drive\Sci&Tech\projects\naivebayes\USSSALoader.py 17
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 3: ordinal not in range(128)

为什么会出现此错误?如何避免它并使 remove_accents 起作用?

感谢您的帮助!

完整代码如下:

#!/usr/bin/env python

# -*- coding: utf-8 -*-
# -*- coding: utf-8 -*-

import os
import re
from zipfile import ZipFile
import csv

##def strip_accents(s):
## return ''.join((c for c in unicodedata.normalize('NFD', s) if unicodedata.category(c) != 'Mn'))

import unicodedata

def remove_accents(input_str):
nkfd_form = unicodedata.normalize('NFKD', unicode(input_str))
return u"".join([c for c in nkfd_form if not unicodedata.combining(c)])

def getNameList():
namesDict=extractNamesDict()
maleNames=list()
femaleNames=list()
for name in namesDict:
print name
# name = strip_accents(name)
name = remove_accents(name)
counts=namesDict[name]
tuple=(name,counts[0],counts[1])
if counts[0]>counts[1]:
maleNames.append(tuple)
elif counts[1]>counts[0]:
femaleNames.append(tuple)
names=(maleNames,femaleNames)
# print maleNames
return names

def extractNamesDict():
zf=ZipFile('names.zip', 'r')
filenames=zf.namelist()

names=dict()
genderMap={'M':0,'F':1}

for filename in filenames:
file=zf.open(filename,'r')
rows=csv.reader(file, delimiter=',')

for row in rows:
#name=row[0].upper().decode('latin1')
name=row[0].upper()
gender=genderMap[row[1]]
count=int(row[2])

if not names.has_key(name):
names[name]=[0,0]
names[name][gender]=names[name][gender]+count

file.close()
# print '\tImported %s'%filename
# print names
return names

if __name__ == "__main__":
getNameList()

最佳答案

最佳实践是在数据进入您的程序时解码为 Unicode:

for row in rows:
name=row[0].upper().decode('utf8') # or whatever...you DO need to know the encoding.

然后 remove_accents 可以是:

def remove_accents(input_str):
nkfd_form = unicodedata.normalize('NFKD', input_str)
return u''.join(c for c in nkfd_form if not unicodedata.combining(c))

在离开程序时对数据进行编码,例如写入文件、数据库、终端等。

为什么首先要删除重音符号?

关于python - 当我尝试删除 Python 字符串中的重音符号时,如何修复出现的 UnicodeDecodeError?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12014810/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com