gpt4 book ai didi

Java - 将文件分成多个部分

转载 作者:行者123 更新时间:2023-12-02 01:41:24 29 4
gpt4 key购买 nike

我正在开发一个项目来获取文件并保存其部分。部分可以是

1.

2.

3.

等等,但也可以

1.1

2.3.1.II.

等等

现在我知道了如何阅读的基本内容。我需要知道是否有一个好的方法来检测文本并将其分成几个部分。

我考虑过正则表达式,但我不知道足够的正则表达式来做到这一点。有什么建议吗?

更新

示例:

1. Position
1.1. Position.
1.2. Scope
1.3. Location.
2. Compensation
2.1. Schedule
2.2.
3. Term
3.1. Term.
3.1.i. bla
3.1.ii. bla bla

最佳答案

您可以使用此正则表达式来划分和捕获 group1 中的编号部分和 group2 中的段落部分。

^((?:[a-zA-Z\d]{1,2}\.)+)\s+(.*)

这里,^((?:[a-zA-Z\d]{1,2}\.)+) 捕获以一到两个字母数字字符开头的编号部分,后跟一个或多个完整的点。然后跟一个空格,因此 \s+ 然后 (.*) 捕获剩余的文本,该文本被假定为一个段落。根据您提供的样本数据,这就是我得出的结果。如果您需要更多不同的案例,请添加更多示例,我将为您提供进一步完善的解决方案。

<强> Demo

这是一个示例 Java 代码,

List<String> list = Arrays.asList("1. Position", "1.1. Position.", "1.2. Scope", "1.3. Location. ",
"2. Compensation", "2.1. Schedule", "2.2. ", "3. Term", "3.1. Term.", "3.1.i. bla", "3.1.ii. bla bla",
"12.a. some para", "13.a. some para", "1.a. some para", "A.1.a. another para", "B.1.a. some para");
Pattern p = Pattern.compile("^((?:[a-zA-Z\\d]+\\.)+)\\s+(.*)");

list.stream().forEach(x -> {
Matcher m = p.matcher(x);
if (m.matches()) {
System.out.println(x + " --> " + "number section: ("+m.group(1)+")" + " para section: ("+m.group(2)+")");
}
});

打印,

1. Position --> number section: (1.) para section: (Position)
1.1. Position. --> number section: (1.1.) para section: (Position.)
1.2. Scope --> number section: (1.2.) para section: (Scope)
1.3. Location. --> number section: (1.3.) para section: (Location. )
2. Compensation --> number section: (2.) para section: (Compensation)
2.1. Schedule --> number section: (2.1.) para section: (Schedule)
2.2. --> number section: (2.2.) para section: ()
3. Term --> number section: (3.) para section: (Term)
3.1. Term. --> number section: (3.1.) para section: (Term.)
3.1.i. bla --> number section: (3.1.i.) para section: (bla)
3.1.ii. bla bla --> number section: (3.1.ii.) para section: (bla bla)
12.a. some para --> number section: (12.a.) para section: (some para)
13.a. some para --> number section: (13.a.) para section: (some para)
1.a. some para --> number section: (1.a.) para section: (some para)
A.1.a. another para --> number section: (A.1.a.) para section: (another para)
B.1.a. some para --> number section: (B.1.a.) para section: (some para)

关于Java - 将文件分成多个部分,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54408294/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com