gpt4 book ai didi

java - 如何独立于语言环境在字符串中查找括号?

转载 作者:塔克拉玛干 更新时间:2023-11-01 22:59:38 26 4
gpt4 key购买 nike

我需要在 Java String 中找到第一对完整的括号,如果它是非嵌套的,则返回其内容。当前的问题是括号可能在不同的语言环境/语言中由不同的字符表示。

我的第一个想法当然是使用正则表达式。但是,如果使用类似 "\((.*)\)"的东西,要确保当前考虑的匹配中没有嵌套括号似乎相当困难(至少对我而言),似乎没有Java 匹配器中可用的类括号字符。

因此,我试图更命令式地解决问题,但偶然发现我需要处理的数据是不同语言的,并且根据语言环境的不同,括号中的字符也不同。西文: (), 中文(Locale "zh"): ()

package main;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.StringReader;
import java.util.HashSet;
import java.util.Set;

public class FindParentheses {

static public Set<String> searchNames(final String string) throws IOException {
final Set<String> foundName = new HashSet<>();
final BufferedReader stringReader = new BufferedReader(new StringReader(string));
for (String line = stringReader.readLine(); line != null; line = stringReader.readLine()) {
final int indexOfFirstOpeningBrace = line.indexOf('(');
if (indexOfFirstOpeningBrace > -1) {
final String afterFirstOpeningParenthesis = line.substring(indexOfFirstOpeningBrace + 1);
final int indexOfNextOpeningParenthesis = afterFirstOpeningParenthesis.indexOf('(');
final int indexOfNextClosingParenthesis = afterFirstOpeningParenthesis.indexOf(')');
/*
* If the following condition is fulfilled, there is a simple braced expression
* after the found product's short name. Otherwise, there may be an additional
* nested pair of braces, or the closing brace may be missing, in which cases the
* expression is rejected as a product's long name.
*/
if (indexOfNextClosingParenthesis > 0
&& (indexOfNextClosingParenthesis < indexOfNextOpeningParenthesis
|| indexOfNextOpeningParenthesis < 0)) {
final String content = afterFirstOpeningParenthesis.substring(0, indexOfNextClosingParenthesis);
foundName.add(content);
}
}
}
return foundName;
}

public static void main(final String args[]) throws IOException {
for (final String foundName : searchNames(
"Something meaningful: shortName1 (LongName 1).\n" +
"Localization issue here: shortName2 (保险丝2). This one should be found, too.\n" +
"Easy again: shortName3 (LongName 3).\n" +
"Yet more random text...")) {
System.out.println(foundName);
}
}

}

第二个带中文括号的东西没有找到,但是应该有。当然,我可能会匹配这些字符作为额外的特殊情况,但由于我的项目使用 23 种语言,包括韩语和日语,我更喜欢找到任何一对括号的解决方案。

最佳答案

我猜你可能想设计一个表达式,可能类似于:

[((]\s*([^))]*)\s*[))]

您想要的括号在这些字符类中的位置:

[((]

测试

import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class re{
public static void main(String[] args){
final String regex = "[((]\\s*([^))]*)\\s*[))]";
final String string = "Something meaningful: shortName1 (LongName 1) Localization issue here: shortName2 (保险丝2). This one should be found, too. Easy again: shortName3 (LongName 3). Yet more random text... Something meaningful: shortName1 (LongName 1) Localization issue here: shortName2 (保险丝2). This one should be found, too. Easy again: shortName3 (LongName 3). Yet more random text...";

final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);

while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
}
}

输出

Full match: (LongName 1)
Group 1: LongName 1
Full match: (保险丝2)
Group 1: 保险丝2
Full match: (LongName 3)
Group 1: LongName 3
Full match: (LongName 1)
Group 1: LongName 1
Full match: (保险丝2)
Group 1: 保险丝2
Full match: (LongName 3)
Group 1: LongName 3

另一种选择是:

(?<=[((])[^))]*(?=[))])    

输出:

Full match: LongName 1
Full match: 保险丝2
Full match: LongName 3
Full match: LongName 1
Full match: 保险丝2
Full match: LongName 3

演示

表达式在 regex101.com 的右上面板中进行了解释, 如果你想探索/简化/修改它,在this link ,如果愿意,您可以观察它如何与一些样本输入相匹配。

引用

List of all unicode's open/close brackets?

关于java - 如何独立于语言环境在字符串中查找括号?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57311468/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com