gpt4 book ai didi

c++ - "initial shift state"是什么?

转载 作者:行者123 更新时间:2023-12-02 05:32:15 25 4
gpt4 key购买 nike

在标准中,经常引用“初始移位状态”这个术语,似乎也出现在各种上下文中,例如多字节字符(字符串)和文件。但该标准错过了对其到底是什么的解释。

那是什么?一般来说,这里的“转变”是什么?

另外:

因为这个术语对我来说似乎用在不同的上下文中(在字符上下文中、在字符串上下文中和在文件上下文中),所以我将指出标准中的一些文本短语(尤其是 ISO/IEC:9899/2018 (C18)),其中包括术语“初始换档状态”:

§ 5.2.1.2 - Multibyte characters

— A multibyte character set may have a state-dependent encoding, wherein each sequence of multibyte characters begins in an initial shift state and enters other locale-specific shift states when specific multibyte characters are encountered in the sequence.

— An identifier, comment, string literal, character constant, or header name shall begin and end in the initial shift state.


§ 7.21.3 - Files

"— A file need not begin nor end in the initial shift state.274)"

"274)Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream (because of possible trailing null characters) or for any stream with state-dependent encoding that does not assuredly end in the initial shift state."


§7.21.6.2 - The fscanf function

For the s conversion specifier:

"If an l length modifier is present, the input shall be a sequence of multibyte characters that begins in the initial shift state."

  • “初始换档状态”是什么意思?那是什么?
  • 什么是上下文中的“转变”?
  • 在字符串上下文中,双引号 " 是否是格式字符串的开头和结尾?

提前致谢。

最佳答案

移位状态是指将某些字节序列解释为字符的状态,这与编码相关。

来自https://www.gnu.org/software/libc/manual/html_node/Shift-State.html

In some multibyte character codes, the meaning of any particular byte sequence is not fixed; it depends on what other sequences have come earlier in the same string. Typically there are just a few sequences that can change the meaning of other sequences; these few are called shift sequences and we say that they set the shift state for other sequences that follow.

To illustrate shift state and shift sequences, suppose we decide that the sequence 0200 (just one byte) enters Japanese mode, in which pairs of bytes in the range from 0240 to 0377 are single characters, while 0201 enters Latin-1 mode, in which single bytes in the range from 0240 to 0377 are characters, and interpreted according to the ISO Latin-1 character set. This is a multibyte code that has two alternative shift states (“Japanese mode” and “Latin-1 mode”), and two shift sequences that specify particular shift states.

初始移位状态只是最初的移位状态,即处理开始时的状态;在上面的示例中,它将是相关序列开始的 ISO Latin-1 或日语中的任何一个。

关于c++ - "initial shift state"是什么?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59471459/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com