gpt4 book ai didi

java - 为什么我在 Java String.split(regex) 中有空标记?

转载 作者:行者123 更新时间:2023-11-29 07:03:03 24 4
gpt4 key购买 nike

我是正则表达式的新手,我正在尝试使用它来解析由“(”、“)”和空格分隔的标记。这是我的尝试:

String str = "(test (_bit1 _bit2 |bit3::&92;test#4|))";
String[] tokens = str.split("[\\s*[()]]");
for(int i = 0; i < tokens.length; i++)
System.out.println(i + " : " + tokens[i]);

我希望得到以下输出:

0 : test
1 : _bit1
2 : _bit2
3 : |bit3::&92;test#4|

但是在实际输出中出现了两个空的token:

0 :
1 : test
2 :
3 : _bit1
4 : _bit2
5 : |bit3::&92;test#4|

我不明白为什么位置 0 和 2 有两个空标记。谁能给我提示?谢谢。

=====更新====

有一个答案Alan Moore谁删了。但是我喜欢这个答案,所以我把它复制在这里供自己引用。

Your regex, [\s*[()]], matches one whitespace character (\s) or one of the characters *, (, or ). The delimiter at the beginning of the string (() is why you get the empty first token. There's no way around that; you just have to check for an empty first token and ignore it.

The second empty token is between the first space and the ( that follows it. That one's on you, because you used * (zero or more) instead of + (one or more). But fixing it isn't that simple. You want to split on spaces, parens, or both, but you have to make sure there's at least one character, whichever it is. This might do it:

\s*[()]+\s*|\s+

But you probably should allow for spaces between parens, too:

\s*(?:[()]+\s*)+|\s+

As a Java string literal, that would be:

\s*(?:[()]+\s*)+|\s+

最佳答案

你的正则表达式是错误的,试试这个:

String[] tokens = str.split("[\s(\)]+");

String[] tokens = str.split("[\\s()]+"); //At least one character

更新:我注意到您的代码实际上删除了括号,因此您似乎不必将它们从括号中转义...不知道为什么,有人可以回答吗?

最新更新:感谢@AlanMoore 的解释,据我所知,[] 中的括号不需要转义。

关于java - 为什么我在 Java String.split(regex) 中有空标记?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22989662/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com