gpt4 book ai didi

java - [] 如何改变 Java 正则表达式?

转载 作者:塔克拉玛干 更新时间:2023-11-01 23:09:12 25 4
gpt4 key购买 nike

我有一个用于验证 UTF-8 字符的正则表达式。

String regex = "[\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{S}\\p{C}]*"

我也想做范围检查所以我修改为

String regex = "[[\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{S}\\p{C}]*]"
String rangeRegex = regex + "{0,30}"

请注意,这是我用 [ ] 包裹的同一个正则表达式。

现在我可以使用 rangeRegex 验证范围,但 regex 现在不验证 UTF-8 字符。

我的问题是:[] 是如何影响 regex 的?如果我从原始正则表达式中删除 [] 它将验证 UTF-8 字符但不验证范围。如果我输入 [] 它将验证范围但不是没有范围!

示例测试代码-

public class Test {

static String regex = "[[\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{S}\\p{C}]*]" ;
public static void main(String[] args) {
String userId = null;
//testUserId(userId);
userId = "";
testUserId(userId);
userId = "æÆbBcCćĆčČçďĎdzDzdzsDzs";
testUserId(userId);
userId = "test123";
testUserId(userId);
userId = "abcxyzsd";
testUserId(userId);

String zip = "i«♣│axy";
testZip(zip);
zip = "331fsdfsdfasdfasd02c3";
testZip(zip);
zip = "331";
testZip(zip);

}

/**
* without range check
* @param userId
*/
static void testUserId(String userId){
boolean pass = true;
if ( !stringValidator(userId, regex)) {
pass = false;
}
System.out.println(pass);
}

/**
* with a range check
* @param zip
*/
static void testZip(String zip){
boolean pass = true;
String regex1 = regex + "{0,10}";
if (StringUtils.isNotBlank(zip) && !stringValidator(zip, regex1)) {
pass = false;
}
System.out.println(pass);
}

static boolean stringValidator(String str, String regex) {
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
return matcher.matches();
}
}

最佳答案

给出的解释对于 Java 正则表达式是相当错误的。

在 Java 中,字符类中未转义的成对方括号不被视为文字 [] 字符。它们在 Java character classes 中具有特殊含义:

[a-d[m-p]]      a through d, or m through p: [a-dm-p] (union)
[a-z&&[def]]  d, e, or f (intersection)
[a-z&&[^bc]]  a through z, except for b and c: [ad-z] (subtraction)
[a-z&&[^m-p]] a through z, and not m through p: [a-lq-z] (subtraction)

因此,当您将 [...] 添加到您的正则表达式时,您会得到前一个正则表达式模式与文字 * 字符的联合,并且意味着 匹配 [\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{S}\\p{C}] 或文字 *

此外,[[\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{S}\\p{ C}]*] 等于 [\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{ S}\\p{C}*] 作为字符类中的 * 符号不再是特殊字符(量词),而是变成文字星号符号 .

如果使用[[]],引擎会抛出异常:Unclosed character class near index 3

参见 this IDEONE demo :

System.out.println("abc[]".replaceAll("[[abc]]", "")); // => []
System.out.println("abc[]".replaceAll("[[]]", "")); // => error

每当你需要用正则表达式检查字符串的长度时,你需要anchors和一个 limiting quantifier .当正则表达式与 Matcher#matches method 一起使用时,会自动添加 anchor :

The matches method attempts to match the entire input sequence against the pattern.

示例代码:

String regex = "[\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{S}\\p{C}]";
String new_regex = regex + "{0,30}";
System.out.println("Some string".matches(new_regex)); // => true

参见 this IDEONE demo

更新

这里是 commented code of yours :

String userId = "";
testUserId(userId); // false - Correct as we test an empty string with an at-least-one-char regex
userId = "æÆbBcCćĆčČçďĎdzDzdzsDzs";
testUserId(userId); // false - Correct as we only match 1 character string, others fail
userId = "test123";
testUserId(userId); // false - see above
userId = "abcxyzsd";
testUserId(userId); // false - see above

String zip = "i«♣│axy";
testZip(zip); // true - OK, 7-symbol string matches against [...]{0,10} regex
zip = "331fsdfsdfasdfasd02c3";
testZip(zip); // false - OK, 21-symbol string does not match a regex that requires only 0 to 10 characters
zip = "331";
testZip(zip); // true - OK, 3-symbol string matches against [...]{0,10} regex

关于java - [] 如何改变 Java 正则表达式?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33766124/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com