gpt4 book ai didi

java - 如何使用 Java 正则表达式分割字符串,并在结果列表中包含可变宽度分隔符

转载 作者:行者123 更新时间:2023-11-30 08:06:28 28 4
gpt4 key购买 nike

我想将字符串拆分为带引号和不带引号的部分,其中转义的引号将被忽略。例如,输入以下内容:

String input = "Example with \"quoted \\\"test\\\" region\" embedded";

应产生以下列表:

String[] result = ["Example with", "\"quoted \\\"test\\\" region\"", "embedded"];

为了分割引用区域(同时忽略转义引号),我使用:

public static final String QUOTE_PATTERN = "(?<!\\\\)\".*?(?<!\\\\)\"";

String input = "Example with \"quoted \\\"test\\\" region\" embedded";
String[] result = input.split(QUOTE_PATTERN);
System.out.println(Arrays.toString(result));

它提供了预期的输出[带有嵌入的示例]。但是,我非常希望在此列表中也包含分隔符(引用的区域)。 (当然,我可以通过使用匹配器获取开始停止索引来实现这一点,但这仍然需要大量额外的代码。)

我找到了一个解决方案,通过使用前向和后向来拆分包含分隔符的字符串,它可以成功地将冒号分隔的字符串拆分为也包含冒号的列表:

public static final String WITH_DELIMITER = "((?<=%1$s)|(?=%1$s))";
public static final String COLON_PATTERN = String.format(WITH_DELIMITER, ":");

String colonTest = "Part0:Part1:Part2";
String[] parts = colonTest.split(COLON_PATTERN);

System.out.println(Arrays.toString(parts));

这提供了以下输出:[Part0, :, Part1, :, Part2]

但是,这似乎不能应用于可变长度的分隔符,因为:

public static final String WITH_DELIMITER = "((?<=%1$s)|(?=%1$s))";
public static final String QUOTE_PATTERN =
String.format(WITH_DELIMITER, "(?<!\\\\)\".*?(?<!\\\\)\"");

String input = "Example with \"quoted \\\"test\\\" region\" embedded";
String[] result = input.split(QUOTE_PATTERN);
System.out.println(Arrays.toString(result));

抛出以下错误:

Exception in thread "main" java.util.regex.PatternSyntaxException: Look-behind group does not have an obvious maximum length near index 23
((?<=(?<!\\)".*?(?<!\\)")|(?=(?<!\\)".*?(?<!\\)"))
^

有谁知道可变宽度分隔符是否可以实现类似的功能?

谢谢!

最佳答案

由于您的字符串长度不超过 200 个符号,因此您可以使用 Java constrained-width look-behind ,即 Java 的后视支持 {0,200}量词(其中指定最小和最大长度)。

✽ Java accepts quantifiers within lookbehind, as long as the length of the matching strings falls within a pre-determined range. For instance, (?<=cats?) is valid because it can only match strings of three or four characters. Likewise, (?<=A{1,10}) is valid.

因此,您可以利用此代码:

String.format(WITH_DELIMITER, "(?<!\\\\)\".{0,200}(?<!\\\\)\"");
^^^^^^^

参见IDEONE demo

   String WITH_DELIMITER = "((?<=%1$s)|(?=%1$s))";
String QUOTE_PATTERN =
String.format(WITH_DELIMITER, "(?<!\\\\)\".{0,200}(?<!\\\\)\"");

String input = "Example with \"quoted \\\"test\\\" region\" embedded";
String[] result = input.split(QUOTE_PATTERN);
System.out.println(Arrays.toString(result));

输出:

[Example with , "quoted \"test\" region",  embedded]

关于java - 如何使用 Java 正则表达式分割字符串,并在结果列表中包含可变宽度分隔符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31025327/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com