gpt4 book ai didi

python - Python 类中的 unicode(self) 和 self.__unicode__() 有什么区别?

转载 作者:太空宇宙 更新时间:2023-11-03 13:03:40 25 4
gpt4 key购买 nike

在处理 unicode 问题时,我发现 unicode(self)self.__unicode__() 有不同的行为:

#-*- coding:utf-8 -*-
import sys
import dis
class test():
def __unicode__(self):
s = u'中文'
return s.encode('utf-8')

def __str__(self):
return self.__unicode__()
print dis.dis(test)
a = test()
print a

上面的代码没问题,但是如果我把self.__unicode__()改成unicode(self),就会报错:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)

有问题的代码是:

#-*- coding:utf-8 -*-
import sys
import dis
class test():
def __unicode__(self):
s = u'中文'
return s.encode('utf-8')

def __str__(self):
return unicode(self)
print dis.dis(test)
a = test()
print a

很好奇 python 如何处理这个问题,我尝试了 dis 模块但没有看到太多差异:

Disassembly of __str__:
12 0 LOAD_FAST 0 (self)
3 LOAD_ATTR 0 (__unicode__)
6 CALL_FUNCTION 0
9 RETURN_VALUE

对比

Disassembly of __str__:
10 0 LOAD_GLOBAL 0 (unicode)
3 LOAD_FAST 0 (self)
6 CALL_FUNCTION 1
9 RETURN_VALUE

最佳答案

您从 __unicode__ 方法返回 bytes

说清楚:

In [18]: class Test(object):
def __unicode__(self):
return u'äö↓'.encode('utf-8')
def __str__(self):
return unicode(self)
....:

In [19]: class Test2(object):
def __unicode__(self):
return u'äö↓'
def __str__(self):
return unicode(self)
....:

In [20]: t = Test()

In [21]: t.__str__()
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
/home/dav1d/<ipython-input-21-e2650f29e6ea> in <module>()
----> 1 t.__str__()

/home/dav1d/<ipython-input-18-8bc639cbc442> in __str__(self)
3 return u'äö↓'.encode('utf-8')
4 def __str__(self):
----> 5 return unicode(self)
6

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

In [22]: unicode(t)
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
/home/dav1d/<ipython-input-22-716c041af66e> in <module>()
----> 1 unicode(t)

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

In [23]: t2 = Test2()

In [24]: t2.__str__()
Out[24]: u'\xe4\xf6\u2193'

In [25]: str(_) # _ = last result
---------------------------------------------------------------------------
UnicodeEncodeError Traceback (most recent call last)
/home/dav1d/<ipython-input-25-3a1a0b74e31d> in <module>()
----> 1 str(_) # _ = last result

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)'

In [26]: unicode(t2)
Out[26]: u'\xe4\xf6\u2193'

In [27]: class Test3(object):
def __unicode__(self):
return u'äö↓'
def __str__(self):
return unicode(self).encode('utf-8')
....:

In [28]: t3 = Test3()

In [29]: t3.__unicode__()
Out[29]: u'\xe4\xf6\u2193'

In [30]: t3.__str__()
Out[30]: '\xc3\xa4\xc3\xb6\xe2\x86\x93'

In [31]: print t3
äö↓

In [32]: print unicode(t3)
äö↓

print a 或在我的例子中 print t 将调用 t.__str__ 预计返回 bytes你让它返回 unicode,所以它会尝试用 ascii 对其进行编码,但这是行不通的。

轻松修复:让 __unicode__ 返回 unicode 和 __str__ 字节。

关于python - Python 类中的 unicode(self) 和 self.__unicode__() 有什么区别?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11117156/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com