gpt4 book ai didi

java - 如何检测未引用或双引号的空格

转载 作者:搜寻专家 更新时间:2023-10-31 19:37:38 25 4
gpt4 key购买 nike

我正在尝试创建一个 Java 正则表达式,它将用一个空格替换字符串中出现的所有空格,除非该空格出现在引号(单引号或双引号)之间

如果我只是在寻找双引号,我可以使用前瞻:

text.replaceAll("\\s+ (?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)", " ");

如果我只是在寻找单引号,我可以使用类似的模式。

诀窍是找到两者。

我有个好主意,先运行双引号模式,然后运行单引号模式,但当然最终会替换所有空格,而不管引号如何。

所以这里有一些测试和预期的结果

a   b   c    d   e   -->  a b c d e
a b "c d" e --> a b "c d" e
a b 'c d' e --> a b 'c d' e
a b "c d' e --> a b "c d' e (Can't mix and match quotes)

有什么方法可以在 Java 正则表达式中实现这一点?

假设无效输入已经单独验证。因此,以下情况都不会发生:

a "b c ' d
a 'b " c' d
a 'b c d

最佳答案

编辑 - 注意 - 这个答案有错误/缺陷

它要求在结束引号("')和它后面的字符之间有一个空格才能正确匹配引用的字符串。所以 ""此答案无法正确处理某些文本

它可能有更多错误 - 但仅此而已。

编辑 - 备选答案

我添加了 another more well optimised answer这没有错。

留在这里留给子孙后代。

支持

这个支持通过 \"\' 转义引号和多行引号。

正则表达式

([^\s"'\\]+)*("[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*')*(\s+)

https://regex101.com/r/wT6tU2/1

替换

$1$2(是的,末尾有一个空格)

可视化

enter image description here

代码

try {
String resultString = subjectString.replaceAll("([^\\s\"'\\\\]+)*(\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\"|'[^'\\\\]*(?:\\\\.[^'\\\\]*)*')*(\\s+)", "$1$2 ");
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
} catch (IllegalArgumentException ex) {
// Syntax error in the replacement text (unescaped $ signs?)
} catch (IndexOutOfBoundsException ex) {
// Non-existent backreference used the replacement text
}

人类可读

// ([^\s"'\\]+)*("[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*')*(\s+)
//
// Options: Case sensitive; Exact spacing; Dot doesn’t match line breaks; ^$ don’t match at line breaks; Default line breaks; Regex syntax only
//
// Match the regex below and capture its match into backreference number 1 «([^\s"'\\]+)*»
// Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
// You repeated the capturing group itself. The group will capture only the last iteration. Put a capturing group around the repeated group to capture all iterations. «*»
// Or, if you don’t want to capture anything, replace the capturing group with a non-capturing group to make your regex more efficient.
// Match any single character NOT present in the list below «[^\s"'\\]+»
// Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
// A “whitespace character” (ASCII space, tab, line feed, carriage return, vertical tab, form feed) «\s»
// A single character from the list “"'” «"'»
// The backslash character «\\»
// Match the regex below and capture its match into backreference number 2 «("[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*')*»
// Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
// You repeated the capturing group itself. The group will capture only the last iteration. Put a capturing group around the repeated group to capture all iterations. «*»
// Or, if you don’t want to capture anything, replace the capturing group with a non-capturing group to make your regex more efficient.
// Match this alternative (attempting the next alternative only if this one fails) «"[^"\\]*(?:\\.[^"\\]*)*"»
// Match the character “"” literally «"»
// Match any single character NOT present in the list below «[^"\\]*»
// Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
// The literal character “"” «"»
// The backslash character «\\»
// Match the regular expression below «(?:\\.[^"\\]*)*»
// Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
// Match the backslash character «\\»
// Match any single character that is NOT a line break character (line feed, carriage return, next line, line separator, paragraph separator) «.»
// Match any single character NOT present in the list below «[^"\\]*»
// Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
// The literal character “"” «"»
// The backslash character «\\»
// Match the character “"” literally «"»
// Or match this alternative (the entire group fails if this one fails to match) «'[^'\\]*(?:\\.[^'\\]*)*'»
// Match the character “'” literally «'»
// Match any single character NOT present in the list below «[^'\\]*»
// Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
// The literal character “'” «'»
// The backslash character «\\»
// Match the regular expression below «(?:\\.[^'\\]*)*»
// Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
// Match the backslash character «\\»
// Match any single character that is NOT a line break character (line feed, carriage return, next line, line separator, paragraph separator) «.»
// Match any single character NOT present in the list below «[^'\\]*»
// Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
// The literal character “'” «'»
// The backslash character «\\»
// Match the character “'” literally «'»
// Match the regex below and capture its match into backreference number 3 «(\s+)»
// Match a single character that is a “whitespace character” (ASCII space, tab, line feed, carriage return, vertical tab, form feed) «\s+»
// Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»

关于java - 如何检测未引用或双引号的空格,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34343678/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com