gpt4 book ai didi

java - 在java中使用正则表达式,解析文本的各个部分

转载 作者:行者123 更新时间:2023-11-30 10:23:59 24 4
gpt4 key购买 nike

我有一个文本文件作为 java 字符串。文本结构如下。我需要解析以名称“Clause”开头的每个部分。这个例子中有三个子句。因此,在解析之后我应该得到三个字符串,每个字符串都以一个子句开头并继续直到它遇到下一个子句,但不包括它。下面的正则表达式给了我类似的东西,但它有多个缺陷。首先,它包括下一节中的 Clause 一词。它还省略了最后一个子句。最糟糕的是,在每次迭代中它都会重复所有子句:

for(int i = 0; i < clauseCount - 1; i++) {
String p2 = "(Clause(.*)Clause)";
Pattern pattern2 = Pattern.compile(p2, Pattern.DOTALL);
Matcher matcher2 = pattern2.matcher(extractedText);
if(matcher2.find()){
System.out.println("Matched: " + matcher2.group());
}
}

这是包含三个子句的示例文本。但是有多个文件,每个文件中的子句数量不同。能否请你帮忙?感谢您的反馈。

Title goes here

there is some text here:

Clause 1. In the following:

here is some text as well. The text that follows may include the name clause one or more times in the text here.

Clause 2. more text here (The text that follows may also include the name clause one or more times inside.):

(1) some text here;

(2) some text here;

(3) some text here;

Clause 3. text for new clause here. The text that follows may or may not include the name clause one or more times inside.:

(1) some text here;

(2) some text here;

(3) more some text here;

(4) some text here;

(5) and numered text can go on;

(6) and may refer to other numbers like so: (3) and (4).

Notified on (some date here)

(and here is a signature)

最佳答案

从一个子句的开头匹配到下一个子句的开头,同时不消耗下一个子句的开头的一种方法是使用前瞻。考虑使用以下模式进行匹配:

Clause\s*[0-9]+\.((?!Clause\s+[0-9]+\.).)*

这表示要匹配 Clause 和一个数字后跟任何内容,一次一个字符,只要紧随其后的是不是 Clause 后跟一个数字和一个点。

String input = "Clause 1. Stuff is a Clause here\nClause 2. More Clause stuff is here.";
String pattern = "Clause\\s*[0-9]+\\.((?!Clause\\s+[0-9]+\\.).)*";
Pattern r = Pattern.compile(pattern, Pattern.DOTALL);
Matcher m = r.matcher(input);

while (m.find()) {
System.out.println("Found value: " + m.group(0));
}

输出:

Found value: Clause 1. Stuff is a Clause here
Found value: Clause 2. More Clause stuff is here.

此处演示:

Rextester

关于java - 在java中使用正则表达式,解析文本的各个部分,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46781427/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com