gpt4 book ai didi

winapi - MB_ERR_INVALID_CHARS 标志是否应该用于 MultiByteToWideChar 的 UTF-8 转换?

转载 作者:行者123 更新时间:2023-12-01 15:20:23 32 4
gpt4 key购买 nike

使用 Win32 API MultiByteToWideChar() 从 Unicode UTF-8 转换为 Unicode UTF-16 时,是否应该使用 MB_ERR_INVALID_CHARS 标志

换句话说,如果出现错误的无效 UTF-8 输入,这是最好的行为,为什么?

  • 使 MultiByteToWideChar() 调用 失败 (使用 MB_ERR_INVALID_CHARS 标志)
  • 只需用 REPLACEMENT CHARACTER U+FFFD
  • 替换无效的输入 UTF-8 字符

    最佳答案

    安全性 的角度来看,在从 UTF-8 转换为 UTF-16 时使用 MB_ERR_INVALID_CHARS 似乎是最好的做法,特别是与 ill-formed UTF-8 subsequences 问题相关(如“Unicode 技术报告 #36:UNICODE安全考虑”):

    3.1.1 Ill-Formed Subsequences

    Suppose that a UTF-8 converter is iterating through input UTF-8 bytes, converting to an output character encoding. If the converter encounters an ill-formed UTF-8 sequence it can treat it as an error in a number of different ways, including substituting a character like U+FFFD, SUB, "?", or SPACE. However, it must not consume any valid successor bytes. For example, suppose we have the following sequence:

    X = <... 41 C2 3E 42 ... >

    This sequence overall is ill-formed, because it contains an ill-formed substring, namely the <C2> [...]

    The UTF-8 converter can stop at the C2 byte, or substitute a character or sequence like U+FFFD and continue. However, it must not consume the 3E byte if it continues. [...]

    Consuming a subsequent byte (such as 3E above) is not only non-conformant; it can lead to security breaches. [...]



    实际上,使用 MB_ERR_INVALID_CHARS 标志会使 MultiByteToWideChar() API 在存在无效 UTF-8 序列的情况下失败 ,因此不存在后续代码(例如调用代码)可能会消耗无效子字符串之后的字节的风险。

    关于winapi - MB_ERR_INVALID_CHARS 标志是否应该用于 MultiByteToWideChar 的 UTF-8 转换?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22824537/

    32 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com