gpt4 book ai didi

c - 如何将 pcre2 修复为\w 将匹配标记?

转载 作者:太空宇宙 更新时间:2023-11-04 02:21:25 25 4
gpt4 key购买 nike

我使用可以找到的 Pcre2 库 here .

如你所见here Pcre2 \w 仅匹配 LN 类别和下划线,不匹配 M - 标记(参见 here ).然而 .Net Regex 匹配标记(参见 here )。

我想更改 PCRE2 的源代码以使其表现得像 .Net Regex,只是我不确定我做的是否正确。

我想做的是在代码中找到引用PT_WORD 的地方,比如this :

case PT_WORD:
if ((PRIV(ucp_gentype)[prop->chartype] == ucp_L ||
PRIV(ucp_gentype)[prop->chartype] == ucp_N ||
fc == CHAR_UNDERSCORE) == (Fop == OP_NOTPROP))

然后像这样添加另一行:

case PT_WORD:
if ((PRIV(ucp_gentype)[prop->chartype] == ucp_L ||
PRIV(ucp_gentype)[prop->chartype] == ucp_N ||
PRIV(ucp_gentype)[prop->chartype] == ucp_M || // <-- new line
fc == CHAR_UNDERSCORE) == (Fop == OP_NOTPROP))

这样做对吗?还有其他需要考虑的事情吗?我还需要在代码的其他地方更改什么?

最佳答案

A .NET \w construct匹配

Category    DescriptionLl          Letter, LowercaseLu          Letter, UppercaseLt          Letter, TitlecaseLo          Letter, OtherLm          Letter, ModifierMn          Mark, NonspacingNd          Number, Decimal DigitPc          Punctuation, Connector. This category includes ten characters, the most commonly used of which is the LOWLINE character (_), u+005F.

Note the differences: .NET \w does not match all numbers, only those from the Nd category, and as for the M category, it only matches Mn subset.

Make sure you match these Unicode categories within your code and \w will behave as in .NET regex.

Use

case PT_WORD:
if ((PRIV(ucp_gentype)[prop->chartype] == ucp_Ll ||
PRIV(ucp_gentype)[prop->chartype] == ucp_Lu ||
PRIV(ucp_gentype)[prop->chartype] == ucp_Lt ||
PRIV(ucp_gentype)[prop->chartype] == ucp_Lo ||
PRIV(ucp_gentype)[prop->chartype] == ucp_Lm ||
PRIV(ucp_gentype)[prop->chartype] == ucp_Mn ||
PRIV(ucp_gentype)[prop->chartype] == ucp_Nd ||
PRIV(ucp_gentype)[prop->chartype] == ucp_Lm ||
PRIV(ucp_gentype)[prop->chartype] == ucp_Pc) == (Fop == OP_NOTPROP))
RRETURN(MATCH_NOMATCH);
break;

请注意,您不需要关心 fc == CHAR_UNDERSCORE,因为它是 \p{Pc} 的一部分,您不能只使用 ucp_L 因为它还包括 \p{LC}

关于c - 如何将 pcre2 修复为\w 将匹配标记?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57477893/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com