gpt4 book ai didi

java - Java中没有明显最大长度的正则表达式后视

转载 作者:IT老高 更新时间:2023-10-28 20:28:36 25 4
gpt4 key购买 nike

我一直认为 Java 的 regex-API(以及许多其他语言)中的 look-behind 断言必须具有明显的长度。因此,look-behinds 中不允许使用 STAR 和 PLUS 量词。

优秀的在线资源regular-expressions.info似乎证实了(部分)我的假设:

"[...] Java takes things a step further by allowing finite repetition. You still cannot use the star or plus, but you can use the question mark and the curly braces with the max parameter specified. Java recognizes the fact that finite repetition can be rewritten as an alternation of strings with different, but fixed lengths. Unfortunately, the JDK 1.4 and 1.5 have some bugs when you use alternation inside lookbehind. These were fixed in JDK 1.6. [...]"

-- http://www.regular-expressions.info/lookaround.html

只要后视中字符范围的总长度小于或等于 Integer.MAX_VALUE,就可以使用大括号。所以这些正则表达式是有效的:

"(?<=a{0,"   +(Integer.MAX_VALUE)   + "})B"
"(?<=Ca{0," +(Integer.MAX_VALUE-1) + "})B"
"(?<=CCa{0," +(Integer.MAX_VALUE-2) + "})B"

但这些不是:

"(?<=Ca{0,"  +(Integer.MAX_VALUE)   +"})B"
"(?<=CCa{0," +(Integer.MAX_VALUE-1) +"})B"

但是,我不明白以下内容:

当我在 look-behind 中使用 * 和 + 量词运行测试时,一切正常(请参阅输出 Test 1Test 2 )。

但是,当我在 Test 1Test 2look-behind 开头添加一个字符时,它会中断 (见输出Test 3)。

使 Test 3 中的贪婪 * 不情愿没有效果,它仍然会中断(参见 Test 4)。

这是测试工具:

public class Main {

private static String testFind(String regex, String input) {
try {
boolean returned = java.util.regex.Pattern.compile(regex).matcher(input).find();
return "testFind : Valid -> regex = "+regex+", input = "+input+", returned = "+returned;
} catch(Exception e) {
return "testFind : Invalid -> "+regex+", "+e.getMessage();
}
}

private static String testReplaceAll(String regex, String input) {
try {
String returned = input.replaceAll(regex, "FOO");
return "testReplaceAll : Valid -> regex = "+regex+", input = "+input+", returned = "+returned;
} catch(Exception e) {
return "testReplaceAll : Invalid -> "+regex+", "+e.getMessage();
}
}

private static String testSplit(String regex, String input) {
try {
String[] returned = input.split(regex);
return "testSplit : Valid -> regex = "+regex+", input = "+input+", returned = "+java.util.Arrays.toString(returned);
} catch(Exception e) {
return "testSplit : Invalid -> "+regex+", "+e.getMessage();
}
}

public static void main(String[] args) {
String[] regexes = {"(?<=a*)B", "(?<=a+)B", "(?<=Ca*)B", "(?<=Ca*?)B"};
String input = "CaaaaaaaaaaaaaaaBaaaa";
int test = 0;
for(String regex : regexes) {
test++;
System.out.println("********************** Test "+test+" **********************");
System.out.println(" "+testFind(regex, input));
System.out.println(" "+testReplaceAll(regex, input));
System.out.println(" "+testSplit(regex, input));
System.out.println();
}
}
}

输出:

********************** Test 1 **********************
testFind : Valid -> regex = (?<=a*)B, input = CaaaaaaaaaaaaaaaBaaaa, returned = true
testReplaceAll : Valid -> regex = (?<=a*)B, input = CaaaaaaaaaaaaaaaBaaaa, returned = CaaaaaaaaaaaaaaaFOOaaaa
testSplit : Valid -> regex = (?<=a*)B, input = CaaaaaaaaaaaaaaaBaaaa, returned = [Caaaaaaaaaaaaaaa, aaaa]

********************** Test 2 **********************
testFind : Valid -> regex = (?<=a+)B, input = CaaaaaaaaaaaaaaaBaaaa, returned = true
testReplaceAll : Valid -> regex = (?<=a+)B, input = CaaaaaaaaaaaaaaaBaaaa, returned = CaaaaaaaaaaaaaaaFOOaaaa
testSplit : Valid -> regex = (?<=a+)B, input = CaaaaaaaaaaaaaaaBaaaa, returned = [Caaaaaaaaaaaaaaa, aaaa]

********************** Test 3 **********************
testFind : Invalid -> (?<=Ca*)B, Look-behind group does not have an obvious maximum length near index 6
(?<=Ca*)B
^
testReplaceAll : Invalid -> (?<=Ca*)B, Look-behind group does not have an obvious maximum length near index 6
(?<=Ca*)B
^
testSplit : Invalid -> (?<=Ca*)B, Look-behind group does not have an obvious maximum length near index 6
(?<=Ca*)B
^

********************** Test 4 **********************
testFind : Invalid -> (?<=Ca*?)B, Look-behind group does not have an obvious maximum length near index 7
(?<=Ca*?)B
^
testReplaceAll : Invalid -> (?<=Ca*?)B, Look-behind group does not have an obvious maximum length near index 7
(?<=Ca*?)B
^
testSplit : Invalid -> (?<=Ca*?)B, Look-behind group does not have an obvious maximum length near index 7
(?<=Ca*?)B
^

我的问题可能很明显,但我还是会问:谁能向我解释为什么 Test 12 失败,以及 测试 34 没有? 我原以为它们都会失败,而不是一半工作,一半失败。

谢谢。

PS。我正在使用:Java 版本 1.6.0_14

最佳答案

查看 Pattern.java 的源代码可以发现,'*' 和 '+' 是作为 Curly 的实例实现的(这是为 curl 运算符创建的对象)。所以,

a*

实现为

a{0,0x7FFFFFFF}

a+

实现为

a{1,0x7FFFFFFF}

这就是为什么你看到 curl 和星星的行为完全相同的原因。

关于java - Java中没有明显最大长度的正则表达式后视,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/1536915/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com