gpt4 book ai didi

Java 正则表达式不匹配

转载 作者:行者123 更新时间:2023-11-30 07:50:13 25 4
gpt4 key购买 nike

我正在尝试编写一个程序,该程序将返回 \begin{theorem}\end{theorem} 之间以及 \begin 之间的所有文本{proof}\end{proof}

使用正则表达式似乎很自然,但由于存在很多潜在的元字符,因此需要对它们进行转义。

这是我编写的代码:

import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class LatexTheoremProofExtractor {

// This is the LaTeX source that will be processed
private String source = null;

// These are the list of theorems and proofs that are extracted, respectively
private ArrayList<String> theorems = null;
private ArrayList<String> proofs = null;

// These are the patterns to match theorems and proofs, respectively
private static final Pattern THEOREM_REGEX = Pattern.compile("\\begin\\{theorem\\}(.+?)\\end\\{theorem\\}");
private static final Pattern PROOF_REGEX = Pattern.compile("\\begin\\{proof\\}(.+?)\\end\\{proof\\}");

LatexTheoremProofExtractor(String source) {
this.source = source;
}

public void parse() {
extractEntity("theorem");
extractEntity("proof");
}

private void extractTheorems() {
if(theorems != null) {
return;
}

theorems = new ArrayList<String>();

final Matcher matcher = THEOREM_REGEX.matcher(source);
while (matcher.find()) {
theorems.add(new String(matcher.group(1)));
}
}

private void extractProofs() {
if(proofs != null) {
return;
}

proofs = new ArrayList<String>();

final Matcher matcher = PROOF_REGEX.matcher(source);
while (matcher.find()) {
proofs.add(new String(matcher.group(1)));
}
}

private void extractEntity(final String entity) {
if(entity.equals("theorem")) {
extractTheorems();
} else if(entity.equals("proof")) {
extractProofs();
} else {
// TODO: Throw an exception or something
}
}

public ArrayList<String> getTheorems() {
return theorems;
}

}

下面是我失败的测试

@Test 
public void testTheoremExtractor() {
String source = "\\begin\\{theorem\\} Hello, World! \\end\\{theorem\\}";
LatexTheoremProofExtractor extractor = new LatexTheoremProofExtractor(source);
extractor.parse();
ArrayList<String> theorems = extractor.getTheorems();
assertEquals(theorems.get(0).trim(), "Hello, World!");
}

显然我的测试表明我希望此测试中只有一场比赛,并且应该是“Hello, World!” (修剪后)。

当前theorems是一个空的、非null数组。因此我的 Matcher 与模式不匹配。谁能帮我理解为什么?

谢谢,埃里普

最佳答案

这是您需要对代码进行的更新 - 提取器方法中的 2 个正则表达式应更改为

private static final Pattern THEOREM_REGEX = Pattern.compile(Pattern.quote("\\begin\\{theorem\\}") + "(.+?)" + Pattern.quote("\\end\\{theorem\\}"));
private static final Pattern PROOF_REGEX = Pattern.compile(Pattern.quote("\\begin\\{proof\\}") + "(.+?)" + Pattern.quote("\\end\\{proof\\}"));

结果将是“Hello, World!”。 See IDEONE demo .

您拥有的字符串实际上是 \begin\{theorem\} Hello, World!\end\{定理\}。 Java 字符串中的文字反斜杠是双倍的,当您需要将 Java 中的文字反斜杠与正则表达式匹配时,需要使用 \\\\。避免backslash hell , Pattern.quote 可以提供帮助,告诉正则表达式将其中的所有子模式视为文字。

有关Pattern.quote的更多详细信息可以在 documentation 中找到。 :

Returns a literal pattern String for the specified String.
This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.

Metacharacters or escape sequences in the input sequence will be given no special meaning.

关于Java 正则表达式不匹配,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33402127/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com