gpt4 book ai didi

c - 字符到整数转换的巧妙之处

转载 作者:行者123 更新时间:2023-11-30 14:57:12 24 4
gpt4 key购买 nike

有人可以清楚地解释一下 K&R 中的这些台词的实际含义吗:

"When a char is converted to an int, can it ever produce a negative integer? The answer varies from machine to machine. The definition of C guarantees that any character in the machine's standard printing character set will never be negative, but arbitrary bit patterns stored in character variables may appear to be negative on some machines,yet positive on others".

最佳答案

该标准有两个或多或少相关的部分 - ISO/IEC 9899:2011。

6.2.5 Types

¶3 An object declared as type char is large enough to store any member of the basic execution character set. If a member of the basic execution character set is stored in a char object, its value is guaranteed to be nonnegative. If any other character is stored in a char object, the resulting value is implementation-defined but shall be within the range of values that can be represented in that type.

¶15 The three types char, signed char, and unsigned char are collectively called the character types. The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char.45)

45) CHAR_MIN, defined in <limits.h>, will have one of the values 0 or SCHAR_MIN, and this can be used to distinguish the two options. Irrespective of the choice made, char is a separate type from the other two and is not compatible with either.

这定义了您引用的 K&R 的内容。其他相关部分定义了基本执行字符集是什么。

5.2.1 Character sets

¶1 Two sets of characters and their associated collating sequences shall be defined: the set in which source files are written (the source character set), and the set interpreted in the execution environment (the execution character set). Each set is further divided into a basic character set, whose contents are given by this subclause, and a set of zero or more locale-specific members (which are not members of the basic character set) called extended characters. The combined set is also called the extended character set. The values of the members of the execution character set are implementation-defined.

¶2 In a character constant or string literal, members of the execution character set shall be represented by corresponding members of the source character set or by escape sequences consisting of the backslash \ followed by one or more characters. A byte with all bits set to 0, called the null character, shall exist in the basic execution character set; it is used to terminate a character string.

¶3 Both the basic source and basic execution character sets shall have the following members: the 26 uppercase letters of the Latin alphabet

A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z

the 26 lowercase letters of the Latin alphabet

a b c d e f g h i j k l m
n o p q r s t u v w x y z

the 10 decimal digits

0 1 2 3 4 5 6 7 8 9

the following 29 graphic characters

! " # % & ' ( ) * + , - . / :
; < = > ? [ \ ] ^ _ { | } ~

the space character, and control characters representing horizontal tab, vertical tab, and form feed. The representation of each member of the source and execution basic character sets shall fit in a byte. In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous. In source files, there shall be some way of indicating the end of each line of text; this International Standard treats such an end-of-line indicator as if it were a single new-line character. In the basic execution character set, there shall be control characters representing alert, backspace, carriage return, and new line. If any other characters are encountered in a source file (except in an identifier, a character constant, a string literal, a header name, a comment, or a preprocessing token that is never converted to a token), the behavior is undefined.

¶4 A letter is an uppercase letter or a lowercase letter as defined above; in this International Standard the term does not include other characters that are letters in other alphabets.

¶5 The universal character name construct provides a way to name other characters.

这些规则的一个结果是,如果机器使用 8 位字符且 EBCDIC编码,然后是普通 char必须是无符号类型,因为数字在 EBCDIC 中的代码为 240..249。

关于c - 字符到整数转换的巧妙之处,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44077321/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com