gpt4 book ai didi

c++ - 字符串文字连接的正确行为(C++11 翻译的第 6 阶段)

转载 作者:塔克拉玛干 更新时间:2023-11-02 23:45:08 26 4
gpt4 key购买 nike

我很确定 Visual C++ 2015 在这里有一个错误,但我不能 100% 确定。

代码:

// Encoding: UTF-8 with BOM (required by Visual C++).
#include <stdlib.h>

auto main()
-> int
{
auto const s = L""
"𐐷 is not in the Unicode BMP!";
return s[0] > 256? EXIT_SUCCESS : EXIT_FAILURE;
}

使用 g++ 的结果:

[H:\scratchpad\simple_text_io]> g++ --version | find "++"g++ (i686-win32-dwarf-rev1, Built by MinGW-W64 project) 6.2.0[H:\scratchpad\simple_text_io]> g++ compiler_bug_demo.cpp[H:\scratchpad\simple_text_io]> run aProcess exit code = 0.[H:\scratchpad\simple_text_io]> _

Visual C++ 的结果:

[H:\scratchpad\simple_text_io]> cl /nologo- 2>&1 | find "++"Microsoft (R) C/C++ Optimizing Compiler Version 19.00.23026 for x86[H:\scratchpad\simple_text_io]> cl compiler_bug_demo.cpp /Febcompiler_bug_demo.cppcompiler_bug_demo.cpp(8): warning C4566: character represented by universal-character-name '\U00010437' cannot be represented in the current code page (1252)[H:\scratchpad\simple_text_io]> run bProcess exit code = 1.[H:\scratchpad\simple_text_io]> _

是否涉及任何 UB,如果没有,哪个编译器的行为正确?

附录:

如果使用 BMP 中的小写希腊圆周率“π”,两个编译器的行为都不会改变,所以这似乎无关紧要。

最佳答案

来自[lex.string]:

  1. In translation phase 6, adjacent string literals are concatenated. If both string literals have the same encoding-prefix, the resulting concatenated string literal has that encoding-prefix. If one string literal has no encoding-prefix, it is treated as a string literal of the same encoding-prefix as the other operand. If a UTF-8 string literal token is adjacent to a wide string literal token, the program is ill-formed. Any other concatenations are conditionally-supported with implementation-defined behavior. [ Note: This concatenation is an interpretation, not a conversion. Because the interpretation happens in translation phase 6 (after each character from a literal has been translated into a value from the appropriate character set), a string literal’s initial rawness has no effect on the interpretation or well-formedness of the concatenation. —end note ] Table 8 has some examples of valid concatenations.

所以这里没有 UB,但是翻译的第 5 阶段可能已经改变了一些字符的值:

  1. Each source character set member in a character literal or a string literal, as well as each escape sequence and universal-character-name in a character literal or a non-raw string literal, is converted to the corresponding member of the execution character set if there is no corresponding member, it is converted to an implementation-defined member other than the null (wide) character.

关于c++ - 字符串文字连接的正确行为(C++11 翻译的第 6 阶段),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41460467/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com