gpt4 book ai didi

java - Java 中正则表达式回溯直到溢出

转载 作者:太空宇宙 更新时间:2023-11-04 11:47:48 26 4
gpt4 key购买 nike

以下表达式:

^(#ifdef FEATURE)+?\s*$((\r\n.*?)*^(#endif)+\s*[\/\/]*\s*(end of)*\s*FEATURE)+?$

运行我编译的 .Jar 文件时覆盖匹配缓冲区。

匹配字符串可以类似于:

this is a junk line

#ifdef FEATURE
#endif // end of FEATURE

this is a junk line

#ifdef FEATURE

this is a junk line that should be matched: HOLasduiqwhei & // FEATURE fjfefj #endif // h

#endif FEATURE

this is a junk line

因此,粗体字符串应该匹配。错误如下:

   at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Curly.match1(Unknown Source)
at java.util.regex.Pattern$Curly.match(Unknown Source)
at java.util.regex.Pattern$Slice.match(Unknown Source)
at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Curly.match1(Unknown Source)
at java.util.regex.Pattern$Curly.match(Unknown Source)
at java.util.regex.Pattern$Slice.match(Unknown Source)
at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Curly.match1(Unknown Source)
at java.util.regex.Pattern$Curly.match(Unknown Source)
at java.util.regex.Pattern$Slice.match(Unknown Source)
at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Curly.match1(Unknown Source)
at java.util.regex.Pattern$Curly.match(Unknown Source)
at java.util.regex.Pattern$Slice.match(Unknown Source)
at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Curly.match1(Unknown Source)
at java.util.regex.Pattern$Curly.match(Unknown Source)
at java.util.regex.Pattern$Slice.match(Unknown Source)

欢迎任何避免策略/改进表达的回溯。我已经尝试过原子组 (?>) 但由于某种原因并没有简化。

代码如下:

公共(public)字符串条(字符串文本){

    ArrayList<String> patterns=new ArrayList<String>();
patterns=readFile("Disabled_Features.txt");
for(int i = 0; i < patterns.size(); ++i)
{

Pattern todoPattern = Pattern.compile("^#ifdef "+patterns.get(i)+"((?:\\r?\\n(?!#endif (?:// end of )?"+patterns.get(i)+"$).*)*)\\r?\\n#endif (?:// end of )?"+patterns.get(i)+"$",Pattern.MULTILINE);

Matcher m = todoPattern.matcher(text);
text = m.replaceAll("");
}
return text;
}

最佳答案

我已经尝试过@Wiktor编写的代码,效果很好

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class TestRegex {
public static void main(String[] args) {
String text = "this is a junk line\n" +
"\n" +
"#ifdef FEATURE \n" +
"#endif // end of FEATURE\n" +
"\n" +
"this is a junk line\n" +
"\n" +
"#ifdef FEATURE\n" +
"\n" +
"this is a junk line that should be matched: HOLasduiqwhei & // FEATURE fjfefj #endif // h\n" +
"\n" +
"#endif FEATURE\n" +
"\n" +
"this is a junk line";

// this version does not use Pattern.MULTILINE, this should reduce the backtraking
Matcher matcher2 = Pattern.compile("\\n#ifdef FEATURE((?:\\r?\\n(?!#endif (?:// end of )?FEATURE).*)*)\\r?\\n#endif (?:// end of )?FEATURE").matcher(text);
while (matcher2.find()) {
System.out.println(matcher2.group());
}

}
}

这让我认为您的问题是由于输入文件的大小造成的。

因此,如果您的文件太大,您可以将输入实现为 CharSequence,这样您就可以包装大型文本文件。为什么?因为从 Pattern 构建 Matcher 需要 CharSequence 作为参数。

https://github.com/fge/largetext

关于java - Java 中正则表达式回溯直到溢出,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42158364/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com