gpt4 book ai didi

java正则表达式支持非ascii值吗?

转载 作者:行者123 更新时间:2023-11-30 09:58:30 26 4
gpt4 key购买 nike

我们有一个当前方法可以清除非字母或空格的字符,这很简单

String clean(String input)
{
return input==null?"":input.replaceAll("[^a-zA-Z ]","");
}

确实应该修复它以支持非英语字符(例如 ś、ũ、...)。不幸的是,Java 正则表达式类(例如 "\W"- 一个非单词字符, "\p{Alpha}"- 仅限 US-ASCII}。 ) 似乎不支持这一点。有没有一种方法可以使用 Java 正则表达式来执行此操作,而不是通过每个字符手动循环来测试它?

最佳答案

Java 6 模式处理 Unicode,参见 this doc .

Unicode escape sequences such as \u2014 in Java source code are processed as described in §3.3 of the Java Language Specification. Such escape sequences are also implemented directly by the regular-expression parser so that Unicode escapes can be used in expressions that are read from files or from the keyboard. Thus the strings "\u2014" and "\\u2014", while not equal, compile into the same pattern, which matches the character with hexadecimal value 0x2014.

Unicode blocks and categories are written with the \p and \P constructs as in Perl. \p{prop} matches if the input has the property prop, while \P{prop} does not match if the input has that property. Blocks are specified with the prefix In, as in InMongolian. Categories may be specified with the optional prefix Is: Both \p{L} and \p{IsL} denote the category of Unicode letters. Blocks and categories can be used both inside and outside of a character class.

关于java正则表达式支持非ascii值吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/917774/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com