gpt4 book ai didi

c - PCRE 不匹配 utf8 字符

转载 作者:太空宇宙 更新时间:2023-11-03 23:51:55 25 4
gpt4 key购买 nike

我正在编译一个启用了 utf8 标志的 PCRE 模式,并试图匹配一个 utf8 char* 字符串,但它不匹配并且 pcre_exec 返回负值.我将主题长度作为 65 传递给 pcre_exec,这是字符串中的字符数。我相信它需要字节数,所以我尝试将参数增加到 70,但仍然得到相同的结果。我不知道还有什么让比赛失败。在我开枪之前请帮忙。

(但是,如果我尝试不使用标志 PCRE_UTF8,它会匹配,但偏移 vector [1] 是 30,它是输入字符串中 unicode 字符之前的字符索引)

#include "stdafx.h"
#include "pcre.h"
#include <pcre.h> /* PCRE lib NONE */
#include <stdio.h> /* I/O lib C89 */
#include <stdlib.h> /* Standard Lib C89 */
#include <string.h> /* Strings C89 */
#include <iostream>

int main(int argc, char *argv[])
{
pcre *reCompiled;

int pcreExecRet;
int subStrVec[30];
const char *pcreErrorStr;
int pcreErrorOffset;
char* aStrRegex = "(\\?\\w+\\?\\s*=)?\\s*(call|exec|execute)\\s+(?<spName>\\w+)("
// params can be an empty pair of parenthesis or have parameters inside them as well.
"\\(\\s*(?<params>[?\\w,]+)\\s*\\)"
// paramList along with its parenthesis is optional below so a SP call can be just "exec sp_name" for a stored proc call without any parameters.
")?";
reCompiled = pcre_compile(aStrRegex, 0, &pcreErrorStr, &pcreErrorOffset, NULL);
if(reCompiled == NULL) {
printf("ERROR: Could not compile '%s': %s\n", aStrRegex, pcreErrorStr);
exit(1);
}

char* line = "?rt?=call SqlTxFunctionTesting(?înFîéld?,?outField?,?inOutField?)";
pcreExecRet = pcre_exec(reCompiled,
NULL,
line,
65, // length of string
0, // Start looking at this point
0, // OPTIONS
subStrVec,
30); // Length of subStrVec

printf("\nret=%d",pcreExecRet);

//int substrLen = pcre_get_substring(line, subStrVec, pcreExecRet, 1, &mantissa);

}

最佳答案

1)

char * q= "î";
printf("%d, %s", q[0], q);

输出:
63、?

2) 您必须使用 PCRE_BUILD_PCRE16(或 32)和 PCRE_SUPPORT_UTF 重建 PCRE。并使用 pcre16.lib 和/或 pcre16.dll。然后你可以试试这个代码:

  pcre16 *reCompiled;
int pcreExecRet;
int subStrVec[30];
const char *pcreErrorStr;
int pcreErrorOffset;
wchar_t* aStrRegex = L"(\\?\\w+\\?\\s*=)?\\s*(call|exec|execute)\\s+(?<spName>\\w+)("
// params can be an empty pair of paranthesis or have parameters inside them as well.
L"\\(\\s*(?<params>[?,\\w\\p{L}]+)\\s*\\)"
// paramList along with its paranthesis is optional below so a SP call can be just "exec sp_name" for a stored proc call without any parameters.
L")?";
reCompiled = pcre16_compile((PCRE_SPTR16)aStrRegex, PCRE_UTF8, &pcreErrorStr, &pcreErrorOffset, NULL);
if(reCompiled == NULL) {
printf("ERROR: Could not compile '%s': %s\n", aStrRegex, pcreErrorStr);
exit(1);
}

const wchar_t* line = L"?rt?=call SqlTxFunctionTesting( ?inField?,?outField?,?inOutField?,?fd? )";
const wchar_t* mantissa=new wchar_t[wcslen(line)];
pcreExecRet = pcre16_exec(reCompiled,
NULL,
(PCRE_SPTR16)line,
wcslen(line), // length of string
0, // Start looking at this point
0, // OPTIONS
subStrVec,
30); // Length of subStrVec

printf("\nret=%d",pcreExecRet);
for (int i=0;i<pcreExecRet;i++){
int substrLen = pcre16_get_substring((PCRE_SPTR16)line, subStrVec, pcreExecRet, i, (PCRE_SPTR16 *)&mantissa);
wprintf(L"\nret string=%s, length=%i\n",mantissa,substrLen);
}

3)\w = [0-9A-Z_a-z]。它不包含 unicode 符号。
4) 这真的很有帮助:http://answers.oreilly.com/topic/215-how-to-use-unicode-code-points-properties-blocks-and-scripts-in-regular-expressions/
5) 来自 PCRE 8.33 源 (pcre_exec.c:2251)

/* Find out if the previous and current characters are "word" characters.
It takes a bit more work in UTF-8 mode. Characters > 255 are assumed to
be "non-word" characters. Remember the earliest consulted character for
partial matching. */

关于c - PCRE 不匹配 utf8 字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18329532/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com