gpt4 book ai didi

java - 在java中使用正则表达式捕获多个文本 block

转载 作者:行者123 更新时间:2023-12-01 14:13:06 25 4
gpt4 key购买 nike

应该使用什么正则表达式来提取由标题分隔的多个文本 block ,这些文本 block 也应该被解析,例如:

some text info before message sequence
============
first message header that should be parsed (may contain = character)
============
first multiline
message body that
should also be parsed
(may contain = character)
============
second message header that should be parsed
============
second multiline
message body that
should also be parsed
... and so on

我试图使用:

String regex = "^=+$\n"+
"^(.+)$\n"+
"^=+$\n"+
"((?s:(?!(^=.+)).+))";
Pattern p = Pattern.compile(regex, Pattern.MULTILINE);

但是 ((?s:(?!(^=.+)).+)) 却吃掉了第二条消息。这是一个显示问题的测试:

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.junit.Assert;
import org.junit.Test;
public class ParsingTest {
@Test
public void test() {
String fstMsgHeader = "first message header that should be parsed (may contain = character)";
String fstMsgBody = "first multiline\n"+
"message body that\n"+
"should also be parsed\n"+
"(may contain = character)";
String sndMsgHeader = "second message header that should be parsed";
String sndMsgBody = "second multiline\n"+
"message body that\n"+
"should also be parsed\n"+
"... and so on";
String sample = "some text info before message sequence\n"+
"============\n"+
fstMsgHeader+"\n"+
"============\n"+
fstMsgBody+"\n"+
"============\n"+
sndMsgHeader+"\n"+
"============\n"+
sndMsgBody +"\n";
System.out.println(sample);
String regex = "^=+$\n"+
"^(.+)$\n"+
"^=+$\n"+
"((?s:(?!(^=.+)).+))";
Pattern p = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = p.matcher(sample);
int blockNumber = 1;
while (matcher.find()) {
System.out.println("Block "+blockNumber+": "+matcher.group(0)+"\n_________________");
if (blockNumber == 1) {
Assert.assertEquals(fstMsgHeader, matcher.group(1));
Assert.assertEquals(fstMsgBody, matcher.group(2));
} else {
Assert.assertEquals(sndMsgHeader, matcher.group(1));
Assert.assertEquals(sndMsgBody, matcher.group(2));
}
}
}

}

最佳答案

我不确定这是否是您正在寻找的内容,但也许这个正则表达式会有所帮助

String regex = 
"={12}\n" + // twelve '=' marks and new line mark
"(.+?)" + // minimal match that has
"\n={12}\n" + // new line mark with twelve '=' marks after it
"(.+?)(?=\n={12}|$)"; // minimal match that will have new line
// character and twelve `=` marks after
// it or end of data $

为了使其工作,您应该使点也与 Pattern.DOTALL 标志匹配新行字符。

Pattern p = Pattern.compile(regex, Pattern.DOTALL);

关于java - 在java中使用正则表达式捕获多个文本 block ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18338465/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com