gpt4 book ai didi

c++ - "Multi-byte Character Set"当前的现代术语是什么

转载 作者:塔克拉玛干 更新时间:2023-11-03 00:27:20 24 4
gpt4 key购买 nike

我曾经很困惑:

Confusion on Unicode and Multibyte Articles

阅读完所有贡献者的评论后,加上:

查看旧文章(2001 年):http://www.hastingsresearch.com/net/04-unicode-limitations.shtml ,其中谈论 unicode :

being a 16-bit character definition allowing a theoretical total of over 65,000 characters. However, the complete character sets of the world add up to over 170,000 characters.

并查看当前的“现代”文章:http://en.wikipedia.org/wiki/Unicode

The most commonly used encodings are UTF-8 (which uses 1 byte for all ASCII characters, which have the same code values as in the standard ASCII encoding, and up to 4 bytes for other characters), the now-obsolete UCS-2 (which uses 2 bytes for all characters, but does not include every character in the Unicode standard), and UTF-16 (which extends UCS-2, using 4 bytes to encode characters missing from UCS-2).

好像在VC2008的编译选项中,Character Sets下的选项“Unicode”真正的意思是“Unicode encoded in UCS-2”(还是UTF-16?我不确定)

我尝试通过在VC2008下运行以下代码来验证这一点

#include <iostream>

int main()
{
// Use unicode encoded in UCS-2?
std::cout << sizeof(L"我爱你") << std::endl;
// Use unicode encoded in UCS-2?
std::cout << sizeof(L"abc") << std::endl;
getchar();

// Compiled using options Character Set : Use Unicode Character Set.
// print out 8, 8

// Compiled using options Character Set : Multi-byte Character Set.
// print out 8, 8
}

似乎在使用 Unicode 字符集选项进行编译时,结果符合我的假设。

但是多字节字符集呢?多字节字符集在当前“现代”世界中意味着什么? :)

最佳答案

http://en.wikipedia.org/wiki/Multi-byte_character_set

MBCS 是一个术语,用于表示一类字符编码,其字符不能用单个字节表示,因此是多字节字符集。为了正确解码这种格式的字符串,您需要一个代码页来告诉您映射到字符的各种字节组合。 ISO/IEC 8859定义了一组 MBCS 标准,但根据维基百科,ISO 在 2004 年停止维护它们,大概是为了专注于 Unicode。

所以我猜 MBCS 的现代术语是“不赞成使用 Unicode”。

关于c++ - "Multi-byte Character Set"当前的现代术语是什么,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/2414261/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com