gpt4 book ai didi

java - 如何编写正则表达式来分割这种格式的字符串?

转载 作者:行者123 更新时间:2023-12-02 03:25:13 24 4
gpt4 key购买 nike

我想使用[,.!?;~]分割字符串,但我想保留 [,.!?;~]到它的位置例如:

This is the example, but it is not enough

[This is the example,, but it is not enough] // length=2
[0]=This is the example,
[1]=but it is not enough

正如您所看到的,逗号仍然在原来的位置。我用这个正则表达式 (?<=([,.!?;~])+) 做到了这一点。 但是 我想在 [,.!?;~] 之后是否出现一些特殊单词(例如:但是) ,然后不要分割字符串的该部分。例如:

I want this sentence to be split into this form, but how to do. So if anyone can help, that will be great

[0]=I want this sentence to be split into this form, but how to do.
[1]=So if anyone can help,
[2]=that will be great

正如你所看到的,这部分(形式,但是)没有被分割成第一句话。

最佳答案

我用过:

  1. 正向回顾 (?<=a)b保留分隔符。
  2. 负前瞻 a(?!b)排除停用词。

请注意我如何附加正则表达式 (?!\\s*(but|and|if))在您提供的正则表达式之后。您可以将所有需要排除的停用词(例如,but、and、if)放在括号内,并用 pipe symbol 分隔。 .

另请注意,分隔符仍然在原来的位置。

输出

Count of tokens = 3
I want this sentence to be split into this form, but how to do.
So if anyone can help,
that will be great

代码

import java.lang.*;

public class HelloWorld {
public static void main(String[] args) {
String str = "I want this sentence to be split into this form, but how to do. So if anyone can help, that will be great";
//String delimiters = "\\s+|,\\s*|\\.\\s*";
String delimiters = "(?<=,)";

// analyzing the string
String[] tokensVal = str.split("(?<=([,.!?;~])+)(?!\\s*(but|and|if))");

// prints the number of tokens
System.out.println("Count of tokens = " + tokensVal.length);

for (String token: tokensVal) {
System.out.println(token);
}
}
}

关于java - 如何编写正则表达式来分割这种格式的字符串?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39030707/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com