gpt4 book ai didi

python - Python 中的 Unicode 到原始字符

转载 作者:太空宇宙 更新时间:2023-11-03 18:56:34 24 4
gpt4 key购买 nike

例如,当我使用时,

unicode_string = u"Austro\u002dHungarian_gulden"
unicode_string.encode("ascii", "ignore")

然后它将给出以下输出:'Austro-Hungarian_gulden'

但我使用的是一个 txt 文件,其中包含如下数据集:

Austria\u002dHungary    Austro\u002dHungarian_gulden
Cocos_\u0028Keeling\u0029_Islands Australian_dollar
El_Salvador Col\u00f3n_\u0028currency\u0029
Faroe_Islands Faroese_kr\u00f3na
Georgia_\u0028country\u0029 Georgian_lari

我必须使用Python中的正则表达式来处理这些数据。 ,所以我创建了一个如下脚本,但它不适用于替换 Unicode字符串中具有适当字符的值。

同样

'\u002d' has appropriate character '-'
'\u0028' has appropriate character '('
'\u0029' has appropriate character ')'

处理文本文件的脚本:

import re
import collections

def extract():
filename = raw_input("Enter file Name:")
in_file = file(filename,"r")
out_file = file("Attribute.txt","w+")
for line in in_file:
values = line.split("\t")
if values[1]:
str1 = ""
for list in values[1]:
list = re.sub("[^\Da-z0-9A-Z()]","",list)
list = list.replace('_',' ')
out_file.write(list)
str1 += list
out_file.write(" ")
if values[2]:
str2 = ""
for list in values[2]:
list = re.sub("[^\Da-z0-9A-Z\n]"," ",list)
list = list.replace('"','')
list = list.replace('_',' ')
out_file.write(list)
str2 += list
s1 = str1.lstrip()
s1 = str1.rstrip()
s2 = str2.lstrip()
s2 = str2.rstrip()
print s1+s2

给定数据的预期输出为:

Austria-Hungary Austro-Hungarian gulden
Cocos (Keeling) Islands Australian dollar
El Salvador Coln (currency)
FaroeIslands Faroese krna
Georgia (country) Georgian lari

我该怎么做?

最佳答案

使用 decode("unicode_escape") 将输入转换为 Unicode,然后 encode() 将输出转换为您选择的编码。

>>> r"Austro\u002dHungarian_gulden".decode("unicode_escape")
u'Austro-Hungarian_gulden'

关于python - Python 中的 Unicode 到原始字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17142765/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com