C 统一码 : How do I apply C11 standard amendment DR488 fix to C11 standard function c16rtomb()?

转载作者：太空宇宙更新时间：2023-11-04 03:14:37

29

4

问题:

如函数的 C 引用页所述，c16rtomb，来自 CPPReference ，在注释部分下:

In C11 as published, unlike mbrtoc16, which converts variable-width multibyte (such as UTF-8) to variable-width 16-bit (such as UTF-16) encoding, this function can only convert single-unit 16-bit encoding, meaning it cannot convert UTF-16 to UTF-8 despite that being the original intent of this function. This was corrected by the post-C11 defect report DR488.

在这段话的下方，C 引用页面提供了一个示例源代码，上面有以下句子:

Note: this example assumes the fix for the defect report 488 is applied.

这句话暗示有一种方法可以采用 DR488 并以某种方式将修复程序“应用”到 C11 标准函数 c16rtomb。

我想知道如何为 GCC 应用修复程序。因为在我看来，从 v141 开始，该修复程序已应用于 Visual Studio 2017 Visual C++。

在 GCC 中看到的行为，在 GDB 中调试代码时，与在 DR488 中发现的一致，如下所示:

Section 7.28.1 describes the function c16rtomb(). In particular, it states "When c16 is not a valid wide character, an encoding error occurs". "wide character" is defined in section 3.7.3 as "value representable by an object of type wchar_t, capable of representing any character in the current locale". This wording seems to imply that, e.g. for the common cases (e.g, an implementation that defines __STDC_UTF_16__ and a program that uses an UTF-8 locale), c16rtomb() will return -1 when it encounters a character that is encoded as multiple char16_t (for UTF-16 a wide character can be encoded as a surrogate pair consisting of two char16_t). In particular, c16rtomb() will not be able to process strings generated by mbrtoc16().

粗体文字是所描述的行为。

源代码:

#include <stdio.h>
#include <uchar.h>

#define __STD_UTF_16__

int main() {
    char16_t* ptr_string = (char16_t*) u"我是誰";

    //C++ disallows variable-length arrays. 
    //GCC uses GNUC++, which has a C++ extension for variable length arrays.
    //It is not a truly standard feature in C++ pedantic mode at all.
    //https://stackoverflow.com/questions/40633344/variable-length-arrays-in-c14
    char buffer[64];
    char* bufferOut = buffer;

    //Must zero this object before attempting to use mbstate_t at all.
    mbstate_t multiByteState = {};

    //c16 = 16-bit Characters or char16_t typed characters
    //r = representation
    //tomb = to Multi-Byte Strings
    while (*ptr_string) {
        char16_t character = *ptr_string;
        size_t size = c16rtomb(bufferOut, character, &multiByteState);
        if (size == (size_t) -1)
            break;
        bufferOut += size;
        ptr_string++;
    }

    size_t bufferOutSize = bufferOut - buffer;
    printf("Size: %zu - ", bufferOutSize);
    for (int i = 0; i < bufferOutSize; i++) {
        printf("%#x ", +(unsigned char) buffer[i]);
    }

    //This statement is used to set a breakpoint. It does not do anything else.
    int debug = 0;
    return 0;
}

Visual Studio 的输出:

Size: 9 - 0xe6 0x88 0x91 0xe6 0x98 0xaf 0xe8 0xaa 0xb0

GCC 的输出:

Size: 0 -

最佳答案

在 Linux 中，您应该可以通过调用 setlocale(LC_ALL, "en_US.utf8");

来解决这个问题

关于 ideone 的示例

此函数将执行以下操作，如 Microsoft documentation 中所述:

Convert a UTF-16 wide character into a multibyte character in the current locale.

POSIX 文档类似。 __STD_UTF_16__ 在这两个编译器中似乎都没有效果。它应该指定源的编码，应该是 UTF16。它没有指定目的地的编码。

Windows 文档似乎更不一致，因为它似乎暗示 setlocale 是必需的或转换为 ANSI 代码页是一个选项

关于C 统一码 : How do I apply C11 standard amendment DR488 fix to C11 standard function c16rtomb()?，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53148386/

29

4

0

文章推荐： javascript - 如何使用多个请求创建 Promise.all

文章推荐： python - 使用 iGraph 定位边缘标签

文章推荐： javascript - 如何使用 Firebase 函数从 Firestore 获取值(value)？

文章推荐： c - realloc，用于 C 中的字符串数组

JavaScript Standard Style（JS Standard 代码风格规则详解）
JavaScript Standard Style 翻译： Português, Spanish, 繁體中文, 简体中文 standard 规则列表，太多不必阅读。了解 standard 的最好方式
Ant 执行 : redirecting standard out but not standard error
我有一个 exec我使用 outputproperty 将其输出放入属性的任务属性。该命令可能会向 stderr 打印一些错误，我不希望将这些错误包含在输出中(因为输出被馈送到另一个命令中)，而是要打
.net-standard - 如何将 .NET Standard 代码标记为符合 CLS？
标题说明了一切 - 如何将 .NET 标准库标记为符合 CLS？我用 C# 编写了一个简单的库，目标是 .NET Standard 1.0 框架。它包括两个枚举: public enum Align
powershell - 将 'standard error' 更改为 'standard output'
我有一个写入错误输出的 PowerShell 脚本。该脚本可以简单如下: Write-Error 'foo' Start-Sleep -s 5 Write-Error 'bar' 我实际调用的脚本产生
.net-standard - 使 .NET Standard 库 COM 可见？
对于完整的 .NET 项目，您可以在 Project Properties > Application tab > Assembly Information.. 中勾选一个框以使项目 COM 可见。
.net-standard - 基于 appveyor .NET Standard 2.0 构建
我将我的项目 ( https://github.com/MarkKhromov/The-Log) 迁移到 .NET Standard 2.0，但我的应用程序构建已损坏。我该如何解决这个问题？我的解决
c++ - "standard output stream"和 "standard output device"有什么区别？
互联网上的许多文章都使用“标准输入/输出/错误流”术语好像每个术语都与使用的“标准输入/输出/错误设备”术语具有相同的含义在其他文章上。例如，很多文章说标准输出流默认是监视器，但可以重定向到文件、打印
go - 错误 : Non-standard import "gopkg.in/yaml.v2" in standard package
我正在尝试从 https://github.com/go-yaml/yaml 导入 go-yaml ，并且我看到了 Google 未提供帮助的错误。我运行了 go get gopkg.in/yaml
c# - .NET 中的 "US Eastern Standard Time"与 "Eastern Standard Time"
在列出 TimeZoneInfo.GetSystemTimeZones 返回的 TimeZoneInfo 的所有 Id 属性时，出现了两个版本的 EST:美国东部标准时间和东部标准时间。有什么区别？
C 统一码 : How do I apply C11 standard amendment DR488 fix to C11 standard function c16rtomb()?
问题: 如函数的 C 引用页所述，c16rtomb，来自 CPPReference ，在注释部分下: In C11 as published, unlike mbrtoc16, which conve
mysql - 错误 : non-standard import "github.com/go-sql-driver/mysql" in standard package
我想使用 go 语言从我的数据库中检索一些数据。这是我在文件 main.go 中的代码的开头 package main import ( _ "github.com/go-sql-driver
standards - STM32F4立体声MEMS麦克风
我一直在通过STM32F4发现进行音频项目，我注意到一件事，所有I2S标准仅适用于一个麦克风(取决于标准使用单独的位的哪个边缘)。例如飞利浦(Philips)，MSB或LSB标准使用下降沿作为位触发，
standards - 汇编语言标准
有没有标准定义了语法和语义的汇编语言 ?与语言类似 C 有 ISO 标准和语言 C# 有 ECMA 标准？是只有一种标准，还是有更多标准？我问是因为我noticed那个汇编语言代码看了不同
standards - 软件版本标准
关闭。这个问题是opinion-based .它目前不接受答案。想改进这个问题？更新问题，以便 editing this post 提供事实和引用来回答它. 1年前关闭。 Improve this
standards - 在URL中使用重复的参数
我们正在内部构建API，并且经常传递带有多个值的参数。他们使用：mysite.com?id=1&id=2&id=3 代替：mysite.com?id=1,2,3 我赞成第二种方法，但我很好奇是否真的
standards - 是否有任何NoSQL标准出现？
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be
standards - 什么是RFC？
我认为有很多人不知道RFC（征求意见）。我知道它们在逻辑上是什么，但是有人能为新开发人员提供一个很好的描述吗？另外，共享一些有关如何使用和阅读它们的资源也很好。最佳答案这个术语来自互联网的前身AR
standards - 索马里兰国家缩写
我找不到 Somaliland 的两个字母的国家/地区缩写，可能是因为它不是一个国家，而是正如维基百科所说:“一个未被承认的 self 宣布的事实上的主权国家，被国际承认为索马里的一个自治区”。尽管如
standards - 格式化日志的最佳实践是什么？
我正在编写一款蜜 jar 软件，该软件将对其交互进行大量记录，我计划记录纯文本 .log 文件。我有两个问题，来自不太熟悉服务器日志方式的人。首先，我该如何分解我的日志文件，我假设运行一个月后我不
standards - 什么时候最好更改代码以符合标准？
我最近负责调试两个不同的程序，这两个程序最终至少需要共享一个 XML 解析脚本。一个是用 PureMVC 编写的，另一个是从头开始构建的。虽然最初从头开始编写是有意义的(它节省了大量内存，但内存问题已

首页

博学

6Ren·AI

商城

C 统一码 : How do I apply C11 standard amendment DR488 fix to C11 standard function c16rtomb()?

问题:

源代码:

Visual Studio 的输出:

GCC 的输出: