gpt4 book ai didi

python - 为什么 python json 解码器会抛出带有 UTF-16 BOM 的字符串?

转载 作者:行者123 更新时间:2023-12-01 01:56:38 25 4
gpt4 key购买 nike

深入研究 python json 解码器实现,我注意到如果将一个字符串传递给 json.loads 并且它以 \ufeff 开头,这是一个 UTF-16 BOM ,它会引发 JSONDecodeError:

if isinstance(s, str):
if s.startswith('\ufeff'):
raise JSONDecodeError("Unexpected UTF-8 BOM (decode using utf-8-sig)", s, 0)

( github )

RFC3629 (UTF-8 的)指出了应禁止 UTF-16 BOM 的两种情况,但它们似乎都不适用于 JSON:

o A protocol SHOULD forbid use of U+FEFF as a signature for thosetextual protocol elements that the protocol mandates to be alwaysUTF-8, the signature function being totally useless in thosecases.

o A protocol SHOULD also forbid use of U+FEFF as a signature forthose textual protocol elements for which the protocol providescharacter encoding identification mechanisms, when it is expectedthat implementations of the protocol will be in a position toalways use the mechanisms properly. This will be the case whenthe protocol elements are maintained tightly under the control ofthe implementation from the time of their creation to the time oftheir (properly labeled) transmission.

RFC7159 (JSON 的)说的是:

JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32. Thedefault encoding is UTF-8, and JSON texts that are encoded in UTF-8are interoperable in the sense that they will be read successfullyby the maximum number of implementations; there are manyimplementations that cannot successfully read texts in otherencodings (such as UTF-16 and UTF-32).

所以在我看来应该允许 UTF-16。那么Python为什么要在这里加注呢?显然我错过了一些东西。

最佳答案

来自currently most recent JSON RFC :

Implementations MUST NOT add a byte order mark (U+FEFF) to the beginning of a networked-transmitted JSON text. In the interests of interoperability, implementations that parse JSON texts MAY ignore the presence of a byte order mark rather than treating it as an error.

类似的语言也出现在 RFC 7159 中.

JSON 实现不需要接受字节顺序标记。 Python 的实现则不然。如果要将带有字节顺序标记的 JSON 传递给 Python 的 JSON 解析器,则应在早期处理阶段删除 BOM。

关于python - 为什么 python json 解码器会抛出带有 UTF-16 BOM 的字符串?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50106749/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com