python - python 如何将 unicode 和非 unicode 元组视为平等的？-6ren

python - python 如何将 unicode 和非 unicode 元组视为平等的？

转载作者：太空宇宙更新时间：2023-11-03 13:38:02

我正在使用 Python 2.7.11。

我有 2 个元组:

>>> t1 = (u'aaa', u'bbb')
>>> t2 = ('aaa', 'bbb')

我试过这个:

>>> t1==t2
True

Python 如何对待 unicode 和非 unicode 相同？

最佳答案

Python 2 认为 bytestrings 和 unicode 是平等的。顺便说一句，这与包含元组无关。相反，它与隐式类型转换有关，我将在下面解释。

很难用“简单”的 ascii 代码点来演示它，因此要了解幕后真正发生的事情，我们可以通过使用更高的代码点来引发故障:

>>> bites = u'Ç'.encode('utf-8')
>>> unikode = u'Ç'
>>> print bites
Ç
>>> print unikode
Ç
>>> bites == unikode
/Users/wim/Library/Python/2.7/bin/ipython:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  #!/usr/bin/python
False

在看到上面的 unicode 和 bytes 比较时，python 已经隐式地尝试通过假设字节是用 sys.getdefaultencoding() 编码(这是 ' ascii' 在我的平台上)。

在我刚才展示的例子中，这失败了，因为字节是用“utf-8”编码的。现在，让它“工作”:

>>> bites = u'Ç'.encode('ISO8859-1')
>>> unikode = u'Ç'
>>> import sys
>>> reload(sys)   # please don't ever actually use this hack, guys 
<module 'sys' (built-in)>
>>> sys.setdefaultencoding('ISO8859-1')
>>> bites == unikode
True

您的上转换“工作”方式几乎相同，但使用“ascii”编解码器。字节和 unicode 之间的这种隐式转换实际上是非常邪恶的，会导致很多 pain ，因此决定停止在 Python 3 中执行这些操作，因为“显式优于隐式”。

作为一个小题外话，在 Python 3+ 上，您的代码实际上都表示 unicode 字符串文字，因此它们无论如何都是相等的。 u 前缀被忽略。如果你想要 python3 中的字节字符串文字，你需要像 b'this' 那样指定它。那么您可能想要 1) 显式解码字节，或 2) 在进行比较之前显式编码 unicode 对象。

关于python - python 如何将 unicode 和非 unicode 元组视为平等的？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36563265/