gpt4 book ai didi

java - 正则表达式不给出溢出错误

转载 作者:塔克拉玛干 更新时间:2023-11-02 08:17:00 26 4
gpt4 key购买 nike

示例代码:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Regex {
public static void main(String[] args) {
String data = "Shyam and you. You are 2.3 km away from home. Lakshmi and you. Ram and you. You are Mike. ";
Pattern pattern = Pattern.compile(
"\\s*((?:[^\\.]|(?:\\w+\\.)+\\w)*are.*?)(?:\\.\\s|\\.$)",
Pattern.DOTALL);
Matcher matcher = pattern.matcher(data);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
}
}

输出:

You are 2.3 km away from home. 

You are Mike.

我在执行上面的代码时得到了预期的输出。但问题是当用更大的字符串测试同一个正则表达式时,它显示溢出错误。我进行了大致相同的搜索,发现正则表达式中的 (A|B)* 之类的交替会导致问题。有什么办法可以解决这个问题吗?请帮忙。

最佳答案

我已尝试重构您的正则表达式以避免回溯。你能试试这个正则表达式吗:

Pattern pattern = Pattern.compile("(?>[^.]|(?:\\w+\\.)+\\w)+\\sare\\s.*?(?>\\.\\s|\\.$)",
Pattern.DOTALL);

(?>group) 称为原子分组

根据:http://www.regular-expressions.info/atomic.html

原子分组

An atomic group is a group that, when the regex engine exits from it, automatically throws away all backtracking positions remembered by any
tokens inside the group
. Atomic groups are non-capturing. The syntax is (?>group).

关于java - 正则表达式不给出溢出错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18490835/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com