gpt4 book ai didi

c++ - rapidxml 在 wchar_t 内容上抛出异常

转载 作者:可可西里 更新时间:2023-11-01 10:34:55 24 4
gpt4 key购买 nike

在win32平台解析wchar_t内容时,rapidxml可能会抛出parse_error异常。内容:

<xml att='最好' />

这是我的测试代码:

/*
* @file : TestRapidXmlBug.cpp
* @author: shilyx
* @date : 2015-09-16 11:02:22.886
* @note : Generated by SlxTemplates
*/

#include <Windows.h>
#include "rapidxml.hpp"
#include <iostream>
#include <string>

using namespace std;
using namespace rapidxml;

int main(int argc, char *argv[])
{
// data block
unsigned char szData[] = {
0x3C, 0x00, 0x78, 0x00, 0x6D, 0x00, 0x6C, 0x00, 0x20, 0x00, 0x61, 0x00, 0x74, 0x00, 0x74, 0x00, 0x3D,
0x00, 0x27, 0x00, 0x00, 0x67, 0x7D, 0x59, 0x27, 0x00, 0x20, 0x00, 0x2F, 0x00, 0x3E, 0x00, 0x00, 0x00};

// uft8 string
char szDataUtf8[sizeof(szData) * 10] = "";

// ucs2 string
wchar_t *szDataUcs2 = (wchar_t *)szData;

WideCharToMultiByte(CP_UTF8, 0, szDataUcs2, -1, szDataUtf8, sizeof(szDataUtf8), NULL, NULL);

try
{
xml_document<wchar_t> xml;

cout<<"-------------------------wchar_t"<<endl;
xml.parse<0>(szDataUcs2); // will throw parse_error
cout<<"success"<<endl;
}
catch (parse_error &ex)
{
cout<<"exception: "<<ex.what()<<endl;
cout<<"failled"<<endl;
}

try
{
xml_document<char> xml;

cout<<"-------------------------char"<<endl;
xml.parse<0>(szDataUtf8); // will not throw any exception
cout<<"success"<<endl;
}
catch (parse_error &ex)
{
cout<<ex.what()<<endl;
cout<<"failled"<<endl;
}

return 0;
}

它将在以下位置抛出异常:

        // Make sure that end quote is present
if (*text != quote)
RAPIDXML_PARSE_ERROR("expected ' or \"", text);
++text; // Skip quote

原因可能是:

// Skip characters until predicate evaluates to true
template<class StopPred, int Flags>
static void skip(Ch *&text)
{
Ch *tmp = text;
while (StopPred::test(*tmp))
++tmp;
text = tmp;
}

StopPred::test 函数:

// Detect attribute value character
template<Ch Quote>
struct attribute_value_pure_pred
{
static unsigned char test(Ch ch)
{
if (Quote == Ch('\''))
return internal::lookup_tables<0>::lookup_attribute_data_1_pure[static_cast<unsigned char>(ch)];
if (Quote == Ch('\"'))
return internal::lookup_tables<0>::lookup_attribute_data_2_pure[static_cast<unsigned char>(ch)];
return 0; // Should never be executed, to avoid warnings on Comeau
}
};

static_cast 将 wchar_t(0x6700) 更改为 unsigned char(0x00),跳过操作停止。


这是一个错误吗?或者将 rapidxml 与 wchar_t 一起使用是错误的?rapidxml的最后更新日期是2013-04-26,我觉得应该够稳定了。

最佳答案

Rapidxml 不完全支持 UTF-16、UTF-32 或其他宽编码。

Current version does not fully support UTF-16 or UTF-32, so use of wide characters is somewhat incapacitated. However, it should succesfully parse wchar_t strings containing UTF-16 or UTF-32 if endianness of the data matches that of the machine.

如您所见,一个有趣的巧合是字符 0x6700 在转换为用于 rapidxml 内部表查找的 unsigned char 时是 0 , 这不是有效的属性字符,因此终止解析。我想文档应该阐明对宽编码的部分支持是可用的,但要注意不要使用 Basic Latin 和 Latin-1 之外的代码点(即 U+0000 ~ U+00FF)。

解决方案是改用 UTF-8。

关于c++ - rapidxml 在 wchar_t 内容上抛出异常,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32599512/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com