gpt4 book ai didi

java - 正则表达式在 Java 中失败

转载 作者:行者123 更新时间:2023-11-30 11:18:13 25 4
gpt4 key购买 nike

我在用 Java 开发正则表达式模式来拆分大学时间表类(class)字符串时遇到问题。

一个例子字符串是这样的:

"CIVL4401_SEM-1:Laboratory_Lab1: 05:11:Engineering - Civil & Mechanical: Soils Lab (G99): [Pref 1] (cont) CIVL4401_SEM-1:Laboratory_Lab2: 07:19:Engineering - Civil & Mechanical: Soils Lab (G99): [Pref 1] (cont) "

(都是一行)

使用正则表达式模式:

final String classregex = "(?<=\\(cont\\)\\s|\\[Pref \\d{1,2}\\]\\s)";

它应该正好分成两个类条目:

"CIVL4401_SEM-1:Laboratory_Lab1: 05:11:Engineering - Civil & Mechanical: Soils Lab (G99): [Pref 1] (cont) "
"CIVL4401_SEM-1:Laboratory_Lab2: 07:19:Engineering - Civil & Mechanical: Soils Lab (G99): [Pref 1] (cont) "

后面的零长度外观是有意的;我想保留所有数据。

相反,我得到:

"CIVL4401_SEM-1:Laboratory_Lab1: 05:11:Engineering - Civil & Mechanical: Soils Lab (G99): [Pref 1] "
"(cont) "
"CIVL4401_SEM-1:Laboratory_Lab2: 07:19:Engineering - Civil & Mechanical: Soils Lab (G99): [Pref 1] "
"(cont) "

我很确定我理解为什么会发生这种情况 - 它首先匹配“[Pref d]”,提取该字符串,然后遍历其余部分,然后立即找到“(cont)”,依此类推。

请注意,还有一些时间表类中没有“(cont)”,因此正则表达式中有“[Pref d]”部分。

是否有某种方式可以安排 Java 正则表达式引擎的工作方式?我希望它在尝试匹配“[Pref d]”部分之前先尝试匹配“(cont)”。我的猜测是必须有一个复杂的向前看和向后看的表达,我不确定如何去做。

如果这做不到,那么我将着手编写一个修复函数来处理这个问题。谢谢。

最佳答案

这个怎么样:

(?<=\(cont\)\s|\[Pref\s\d\]\s(?!\(cont\)))

它还会检查 [Pref \d]后面没有 (cont)

在 Java 世界中,这将是:

(?<=\\(cont\\)\\s|\\[Pref\\s\\d\\]\\s(?!\\(cont\\)))

但我惊讶地发现即使这样也行得通

(?<=\\(cont\\)\\s|\\[Pref\\s\\d{1,2}\\]\\s(?!\\(cont\\)))

正如 OP 在评论中提到的那样,Java 似乎支持后视中的有限范围量词。这是 regular-expressions.info 的摘录:

Java takes things a step further by allowing finite repetition. You still cannot use the star or plus, but you can use the question mark and the curly braces with the max parameter specified. Java determines the minimum and maximum possible lengths of the lookbehind. The lookbehind in the regex (?<!ab{2,4}c{3,5}d)test has 6 possible lengths. It can be between 7 to 11 characters long. When Java (version 6 or later) tries to match the lookbehind, it first steps back the minimum number of characters (7 in this example) in the string and then evaluates the regex inside the lookbehind as usual, from left to right. If it fails, Java steps back one more character and tries again. If the lookbehind continues to fail, Java continues to step back until the lookbehind either matches or it has stepped back the maximum number of characters (11 in this example). This repeated stepping back through the subject string kills performance when the number of possible lengths of the lookbehind grows. Keep this in mind. Don't choose an arbitrarily large maximum number of repetitions to work around the lack of infinite quantifiers inside lookbehind. Java 4 and 5 have bugs that cause lookbehind with alternation or variable quantifiers to fail when it should succeed in some situations. These bugs were fixed in Java 6.

关于java - 正则表达式在 Java 中失败,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23949338/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com