gpt4 book ai didi

c++ - 什么假设对于 C++ 实现的字符集是安全的?

转载 作者:塔克拉玛干 更新时间:2023-11-03 06:46:31 25 4
gpt4 key购买 nike

在 The C++ Programming Language 6.2.3 中,它说:

It is safe to assume that the implementation character set includes the decimal digits, the 26 alphabetic characters of English, and some of the basic punctuation characters. It is not safe to assume that:

  • There are no more than 127 characters in an 8-bit character set (e.g., some sets provide 255 characters).

  • There are no more alphabetic characters than English provides (most European languages provide more, e.g., æ, þ, and ß).

  • The alphabetic characters are contiguous (EBCDIC leaves a gap between 'i' and 'j').

  • Every character used to write C++ is available (e.g., some national character sets do not provide {, }, [, ], |, and \).

  • A char fits in 1 byte. There are embedded processors without byte accessing hardware for which a char is 4 bytes. Also, one could reasonably use a 16-bit Unicode encoding for the basic chars.

我不确定我是否理解最后两个陈述。

在标准的第 2.3 节中,它说:

The basic source character set consists of 96 characters: the space character, the control characters representing horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters:

a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9
_ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ! = , \ " '
...

The basic execution character set and the basic execution wide-character set shall each contain all the members of the basic source character set, plus control characters representing alert, backspace, and carriage return, plus a null character (respectively, null wide character), whose representation has all zero bits.

我们可以看到,像{ } [ ] | 这样的字符是标准规定的\是基本执行字符集的一部分。那么为什么 TC++PL 说假设这些字符在实现的字符集中可用是不安全的?

对于 char 的大小,在标准的第 5.3.3 节中:

The sizeof operator yields the number of bytes in the object representation of its operand. ... ... sizeof(char), sizeof(signed
char)
and sizeof(unsigned char) are 1.

我们可以看到标准规定一个char是1个字节。 TC++PL 在这里试图说明什么?

最佳答案

  • “byte”这个词在第一个引用中似乎被草率地使用了。就 C++ 而言,一个字节始终是一个字符,但它包含的位数取决于平台(并且在 CHAR_BITS 中可用)。有时您想说“一个字节是八位”,在这种情况下您会得到不同的含义,而这可能是短语“一个字符有四个字节”的预期含义。

  • 执行字符集很可能比环境提供的输入字符集大或不兼容。三字母和替代标记的存在允许在此类受限平台上用较少的输入字符表示执行集字符(例如,not 在所有目的上都与 ! 相同,而后者并非在所有字符集或键盘布局中都可用)。

关于c++ - 什么假设对于 C++ 实现的字符集是安全的?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21364673/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com