gpt4 book ai didi

c++ - 使用 C++11 正则表达式匹配文本范围

转载 作者:搜寻专家 更新时间:2023-10-31 01:38:59 27 4
gpt4 key购买 nike

我在 C++ 中尝试正则表达式,这里是一些代码

#include <iostream>
#include <regex>


int main (int argc, char *argv[]) {
std::regex pattern("[a-z]+", std::regex_constants::icase);
std::regex pattern2("excelsior", std::regex_constants::icase);
std::string text = "EXCELSIOR";

if (std::regex_match(text, pattern)) std::cout << "works" << std::endl;
else std::cout << "doesn't work" << std::endl;

if (std::regex_match(text, pattern2)) std::cout << "works" << std::endl;
else std::cout << "doesn't work" << std::endl;

return 0;
}

现在,据我了解,这两个匹配项都应该输出 works,但是第一个输出 doesn't work,而第二个输出 works 正如预期的那样。为什么?

最佳答案

根据[re.grammar]中描述的规则,我们有:

— During matching of a regular expression finite state machine against a sequence of characters, two characters c and d are compared using the following rules:
1. if (flags() & regex_constants::icase) the two characters are equal if traits_inst.translate_nocase(c) == traits_inst.translate_nocase(d);
2. otherwise, if flags() & regex_constants::collate the two characters are equal if traits_inst.translate(c) == traits_inst.translate(d);
3. otherwise, the two characters are equal if c == d.

这适用于您的 pattern2 ,我们正在匹配一个字符序列,我们有 flags() & icase , 所以我们做一个 nocase 比较。由于序列中的每个字符都匹配,因此它“有效”。

但是,对于 pattern ,我们没有字符序列。所以我们改用这个规则:

— During matching of a regular expression finite state machine against a sequence of characters, comparison of a collating element range c1-c2 against a character c is conducted as follows: if flags() & regex_constants::collate is false then the character c is matched if c1 <= c && c <= c2, otherwise c is matched in accordance with the following algorithm:

string_type str1 = string_type(1,
flags() & icase ?
traits_inst.translate_nocase(c1) : traits_inst.translate(c1);
string_type str2 = string_type(1,
flags() & icase ?
traits_inst.translate_nocase(c2) : traits_inst.translate(c2);
string_type str = string_type(1,
flags() & icase ?
traits_inst.translate_nocase(c) : traits_inst.translate(c);
return traits_inst.transform(str1.begin(), str1.end())
<= traits_inst.transform(str.begin(), str.end())
&& traits_inst.transform(str.begin(), str.end())
<= traits_inst.transform(str2.begin(), str2.end());

因为你没有 collate设置,字符在字面上与范围 a-z 匹配.没有会计 icase在这里,这就是它“不起作用”的原因。如果您提供 collate然而:

std::regex pattern("[a-z]+", 
std::regex_constants::icase | std::regex_constants::collate);

然后我们使用所描述的算法进行非大小写比较,结果将是“有效”。两个编译器都是正确的——尽管我发现在这种情况下预期的行为令人困惑。

关于c++ - 使用 C++11 正则表达式匹配文本范围,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31994460/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com