gpt4 book ai didi

mingw 中 UTF-8 的 C++ ctype 方面

转载 作者:搜寻专家 更新时间:2023-10-31 00:49:23 25 4
gpt4 key购买 nike

在项目中,所有内部字符串都以 utf-8 编码保存。该项目移植到 Linux 和 Windows。现在需要 to_lower 功能。

在 POSIX 操作系统上,我可以使用 std::ctype_byname("ru_RU.UTF-8")。但是对于 g++ (Debian 4.3.4-1),ctype::tolower() 无法识别俄语 UTF-8 字符(拉丁文可以小写)。

在 Windows 上,当我尝试使用“ru_RU.UTF-8”参数构造 std::ctype_byname 时,mingw 的标准库抛出异常“std::runtime_error:locale::facet::_S_create_c_locale name not valid”。

如何在 Windows 上为 utf-8 实现/查找 std::ctype?该项目已经依赖于 libiconv(codecvt 方面基于它),但我没有看到用它实现 to_lower 的明显方法。

最佳答案

尝试使用STLport

  Here is a description of how you can use STLport to read/write utf8 files.utf8 is a way of encoding wide characters. As so, management of encoding inthe C++ Standard library is handle by the codecvt locale facet which is partof the ctype category. However utf8 only describe how encoding must beperformed, it cannot be used to classify characters so it is not enough infoto know how to generate the whole ctype category facets of a localeinstance.In C++ it means that the following code will throw an exception tosignal that creation failed:#include // Will throw a std::runtime_error exception.std::locale loc(".utf8");For the same reason building a locale with the ctype facets based onUTF8 is also wrong:// Will throw a std::runtime_error exception:std::locale loc(locale::classic(), ".utf8", std::locale::ctype);The only solution to get a locale instance that will handle utf8 encodingis to specifically signal that the codecvt facet should be based on utf8encoding:// Will succeed if there is necessary platform support.locale loc(locale::classic(), new codecvt_byname(".utf8"));  Once you have obtain a locale instance you can inject it in a file stream toread/write utf8 files:std::fstream fstr("file.utf8");fstr.imbue(loc);You can also access the facet directly to perform utf8 encoding/decoding operations:typedef std::codecvt codecvt_t;const codecvt_t& encoding = use_facet(loc);Notes:1. The dot ('.') is mandatory in front of utf8. This is a POSIX convention, localenames have the following format:language[_country[.encoding]]Ex: 'fr_FR'    'french'    'ru_RU.koi8r'2. utf8 encoding is only supported for the moment under Windows. The less commonutf7 encoding is also supported. 

关于mingw 中 UTF-8 的 C++ ctype 方面,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/1354124/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com