gpt4 book ai didi

Python将代码页字符数转换为unicode

转载 作者:行者123 更新时间:2023-12-03 08:58:26 25 4
gpt4 key购买 nike

默认情况下,print(chr(195)) 在位置 195 处显示 unicode 字符(“à”)如何打印 code page 中出现的 chr(195) 1251,即。 “Г”我尝试过: print(chr(195).decode('cp1252')) 和各种 .encode 方法。

感谢大家的帮助,我现在有了打印代码页的程序:

# Print selected Code Pages named at https://docs.python.org/3.6/library/codecs.html#standard-encodings
# Ian Tresman. 10 November 2018.

codepages=['cp037', 'cp273', 'cp424', 'cp437', 'cp500', 'cp720', 'cp737', 'cp775', 'cp850', 'cp852', 'cp855', 'cp856',
'cp857', 'cp858', 'cp860', 'cp861', 'cp862', 'cp863', 'cp864', 'cp865', 'cp866', 'cp869', 'cp874', 'cp875',
'cp932', 'cp1006', 'cp1026', 'cp1125', 'cp1140', 'cp1250', 'cp1251', 'cp1252', 'cp1253', 'cp1254', 'cp1255',
'cp1256', 'cp1257', 'cp1258', 'latin_1', 'iso8859_1', 'iso8859_2', 'iso8859_3', 'iso8859_4', 'iso8859_5',
'iso8859_6', 'iso8859_7', 'iso8859_8', 'iso8859_9', 'iso8859_10', 'iso8859_11', 'iso8859_13', 'iso8859_14',
'iso8859_15', 'iso8859_16', 'koi8_r', 'koi8_t', 'koi8_u', 'kz1048', 'mac_cyrillic', 'mac_greek', 'mac_iceland',
'mac_latin2', 'mac_roman', 'mac_turkish', 'ptcp154']

for codepage in codepages: #Select each code page in turn
print(" "*12 + "Codepage: ", codepage) #Indented code page name
print(" | 0 1 2 3 4 5 6 7 8 9 A B C D E F") #Code page columns, A=10, B=11 etc
print(" " + "-"*33) #Horizontal rule
for row in range(32,255,16): #For each row (ignore control characters < 32)
print(f"{row:3}:",end= ' ') #Print row code
for col in range(0,16): #For each column
char=row+col #Calculate character number (similar to ascii code)
try: #Try to get a unicode equivalent of a specific byte value:
unichar=bytes([char]).decode(codepage) #Fails with non-mappable characters, and some control characters
except:
unichar=" " #If there was no unicode, use a space

if not (unichar.isprintable()): unichar=" " #If the unicode is not printable, use a space
print(unichar, end=' ')
print() #End of row break
print() #End of code page spacing
input("") #Pause after each code page, press Enter to continue

最佳答案

由于您无法在字符串中存储“原始”值0xC3(如果您这样做了,您不应该拥有 – 原始二进制“未解析”数据应该是字节数组):正确的方法从原始 byte 数组转换确实是 .decode('cp1251'):

>>> print (b'\xc3'.decode('cp1251'))
Г

但是,如果您已经将其放入字符串中,那么最简单的方法是首先使用一对一“编码”Latin-1 将字符串转换为 bytes 对象:

str = 'Ãamma'
print (bytes(str.encode('latin1')).decode('cp1251'))
>>> Гamma

关于Python将代码页字符数转换为unicode,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53146514/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com