- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
在阅读了 polygenelubricants 的关于高级正则表达式技术的系列文章(特别是 How does this Java regex detect palindromes? )后,我决定尝试创建自己的 PCRE 正则表达式来解析回文,使用递归(在 PHP 中)。
我想出的是:
^(([a-z])(?1)\2|[a-z]?)$
a
、
aaa
、
aaaaaaa
、 a15)与正则表达式匹配。
^(([a-z])(?1)?\2|[a-z]?)$
,参见
www.ideone.com/D6lJR ,它只匹配具有重复 2n 次字符的字符串(即空字符串,
a
,
aa
, 404,607914 , 404,607)。
最佳答案
您观察到的现象是由于 PCRE 子模式递归是原子的,与 Perl 不同。 man page实际上非常详细地涵盖了这个问题:
In PCRE (like Python, but unlike Perl), a recursive subpattern call is always treated as an atomic group. That is, once it has matched some of the subject string, it is never re-entered, even if it contains untried alternatives and there is a subsequent matching failure.
This can be illustrated by the following pattern, which purports to match a palindromic string that contains an odd number of characters (for example,
"a"
,"aba"
,"abcba"
,"abcdcba"
):^(.|(.)(?1)\2)$
The idea is that it either matches a single character, or two identical characters surrounding a sub-palindrome. In Perl, this pattern works; in PCRE it does not if the pattern is longer than three characters.
Consider the subject string
"abcba"
:At the top level, the first character is matched, but as it is not at the end of the string, the first alternative fails; the second alternative is taken and the recursion kicks in. The recursive call to subpattern 1 successfully matches the next character (
"b"
). (Note that the beginning and end of line tests are not part of the recursion).Back at the top level, the next character (
"c"
) is compared with what subpattern 2 matched, which was"a"
. This fails. Because the recursion is treated as an atomic group, there are now no backtracking points, and so the entire match fails. (Perl is able, at this point, to re- enter the recursion and try the second alternative.) However, if the pattern is written with the alternatives in the other order, things are different:^((.)(?1)\2|.)$
This time, the recursing alternative is tried first, and continues to recurse until it runs out of characters, at which point the recursion fails. But this time we do have another alternative to try at the higher level. That is the big difference: in the previous case the remaining alternative is at a deeper recursion level, which PCRE cannot use.
To change the pattern so that matches all palindromic strings, not just those with an odd number of characters, it is tempting to change the pattern to this:
^((.)(?1)\2|.?)$
Again, this works in Perl, but not in PCRE, and for the same reason. When a deeper recursion has matched a single character, it cannot be entered again in order to match an empty string. The solution is to separate the two cases, and write out the odd and even cases as alternatives at the higher level:
^(?:((.)(?1)\2|)|((.)(?3)\4|.))$
WARNING!!!
The palindrome-matching patterns above work only if the subject string does not start with a palindrome that is shorter than the entire string. For example, although
"abcba"
is correctly matched, if the subject is"ababa"
, PCRE finds the palindrome"aba"
at the start, then fails at top level because the end of the string does not follow. Once again, it cannot jump back into the recursion to try other alternatives, so the entire match fails.
(?>…)
在某些方面是原子分组语法 (?=…)
, (?!…)
, (?<=…)
, (?<!…)
, 都是原子的 a*+
)也是原子的 ^(([a-z])(?1)\2|[a-z]?)$
1
表示该角色在第一个备选 2
表示该字符与第二个备选匹配2
不高于字符,?
的零重复选项被行使\
表示该字符与第一个备用 _
表示递归分支的底部"aaa"
作为输入:
_
1 1 1 2
a a a # This is the first bottom of the recursion,
# now we go back to the third 1 and try to match \.
# This fails, so the third 1 becomes 2.
_
1 1 2
a a a # Now we go back to the second 1 and try to match \.
# This fails, so the second 1 becomes 2.
_
1 2
a a a # The second level matched! now we go back to the first level...
_____
1 2 \
a a a # Now the first 1 can match \, and entire pattern matches!!
"aaaaa"
:
_
1 1 1 1 1 2
a a a a a # Fifth 1 can't match \, so it becomes 2.
_
1 1 1 1 2
a a a a a # Fourth 1 can't match \, so it becomes 2.
_____
1 1 1 2 /
a a a a a # Here's a crucial point. The third 1 successfully matched.
# Now we're back to the second 1 and try to match \, but this fails.
# However, since PCRE recursion is atomic, the third 1 will NOT be
# reentered to try 2. Instead, we try 2 on the second 1.
_____
1 2 \
a a a a a # Anchors don't match, so the first 1 becomes 2, and then also the
# anchors don't match, so the pattern fails to match.
"aa"
:
_
1 1 2
a a
_
1 2
a a # The second level matched by taking the one repetition option on ?.
# We now go back to the first level, and we can't match \.
# Since PCRE recursion is atomic, we can't go back to the second level
# to try the zero repetition option on ?.
_
2
a a # Anchors don't match, trying zero option on ? also doesn't help,
# so the pattern fails to match!
?
的一次重复匹配在第二种选择中,将来不会尝试零重复选项(即使这样做可能导致可能匹配),因为 PCRE 子模式递归是原子的。
aaaaaaa
_
1 1 1 1 1 1 1 2
a a a a a a a
_
1 1 1 1 1 1 2
a a a a a a a
_____
1 1 1 1 1 2 \
a a a a a a a # A crucial point: the fifth level matched and now the fourth
# level can't match \, but it does NOT reenter the fifth level to
# try 2. Instead, the fourth level tries 2.
_____
1 1 1 2 \
a a a a a a a
_________
1 1 1 2 \ \
a a a a a a a
_____________
1 1 1 2 \ \ \
a a a a a a a # Entire pattern is a match!
"abcba"
:
_
1 1 1 1 1 2
a b c b a
_
1 1 1 1 2
a b c b a
1 1 1 2
a b c b a # Third level attempts \, but c does not match a!
# So we go back to third 1 and try 2.
_____
1 1 2 \
a b c b a
_________
1 1 2 \ \
a b c b a # Entire pattern is a match!
"abcba"
(
as seen on ideone.com )。然而,它不能匹配
"ababa"
,也不能匹配
"aaaaa"
(请参阅手册页上的警告!),因为 PCRE 中的子模式递归是原子的。
关于regex - 为什么这个递归正则表达式只在字符重复 2^n - 1 次时才匹配?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/3738631/
我正在尝试编写一个函数,该函数接受输入字符串、正则表达式(由 std.regex.regex 从原始字符串生成)和错误消息字符串,并尝试匹配来自使用正则表达式输入字符串,如果没有匹配则显示错误消息。到
-edit- 注意 ?末.{2,}? 我发现你可以写 .{2,}? 是不是和下面一模一样? .{2} 最佳答案 号{2,}表示两次或更多次同时 {2}意思是正好两次。量词默认是贪婪的,所以给定字符串
我有以下文字: This is a test ::a. MODE 3 within 7 hours, ::b. MODE 4 within 13 hours, and ::c. MODE 5 with
我用 Regex.fromLiteral(".*") 创建了一个非常简单的匹配所有正则表达式. 根据documentation :“返回指定文字字符串的文字正则表达式。” 但是我真的不明白“对于指定的
就目前而言,这个问题不适合我们的问答形式。我们希望答案得到事实、引用或专业知识的支持,但这个问题可能会引起辩论、争论、投票或扩展讨论。如果您觉得这个问题可以改进并可能重新打开,visit the he
该Web项目将静态内容放入一些/content/img文件夹中。 网址规则是:/img/{some md5} 但在文件夹中的位置:/content/img/{前两位数字}/ 例子 url:
我有以下数据: SOMEDATA .test 01/45/12 2.50 THIS IS DATA 我想从中提取数字 2.50。我已设法使用以下 RegEx 做到这一点: (?<=\d{2}\/\d{
我需要证明或反驳下面的正则表达式 (RS + R )* R = R (SR + R)* // or, for programmers: /(RS|R)*R/ == /R(SR|R)*/ 我有一种强烈的
对于具有自由文本的字符串: "The shares of the stock at the XKI Market fell by €89.99 today, which saw a drop of a
例如,我有 RegEx DSX-?2 的 var 我需要将此变量添加到 RegEx 并获取此 .match(/DSX-?2/gi) 最佳答案 您可以创建一个 RegExp对象使用 new RegExp
我无法区分大小写的搜索无法在SQLITE中用于REGEX。支持语法吗? SELECT * FROM table WHERE name REGEXP 'smith[s]*\i' 我希望得到以下答案(假设
Visual Studio / XPath / RegEx: 给定表达式: (?(Car|Car Blue)) +(?.+) +---> +(?.+) 给定搜索字符串: Car Blue Flying
我有一个看起来像这样的正则表达式 /^(?:\w+\s)*(\w+)$*/ 什么是?: ? 最佳答案 它表示子模式是非捕获子模式。这意味着在 (?:\w+\s) 中匹配的任何内容,即使它被 () 括起
就目前而言,这个问题不适合我们的问答形式。我们希望答案得到事实、引用或专业知识的支持,但这个问题可能会引起辩论、争论、投票或扩展讨论。如果您觉得这个问题可以改进并可能重新打开,visit the he
我在 Excel 工作表(也以 csv 格式)中获得了姓名列表,并根据姓名来源进行了分组。 这就是我创建的组的样子。 现在我想添加一个新列,名称后面包含组名称。 这就是我想要获得的。 我如何得到这个?
我试图将一个字符串拆分为一个字符串列表,单词是分开的,但是周围的字符,例如.. "?()“”!"也分开。 要分隔的字符串是"testing “testing” “one two three” (hi
我有一个来自视频转换文件的完整日志,它看起来像这样: -------------------------------------------------------------------------
在定界符为“-”的模式 X-Y-Z 中,我想检查 Y 是否具有大小 8 而没有重复。 Y 可以是像 Y = (A-B-C) 这样的子集,但如果没有,则 Y 的值为 1 1 - num-12345678
Java确实有这个功能,谢谢你的回答,对我来说失去对API的关注太可惜了... 例如: String strOriginal = "A:B&C@D"; 我认为java中应该有一个非常好的方法来改变它,
我只需要接受符合这些规则的输入... 0.25-24 0.25 的增量(.00、.25、.50、.75) 第一个数字不是必须的。 希望尾随零是可选的。 一些有效条目的示例: 0.25 .50 .5 1
我是一名优秀的程序员,十分优秀!