gpt4 book ai didi

java - 将 PCRE 正则表达式修改为 C# 或 Java 支持的正则表达式

转载 作者:行者123 更新时间:2023-11-30 10:04:07 27 4
gpt4 key购买 nike

业务需求:地址需要解析为街道、门牌号和地址行 2

示例单行地址

Bygholm Søpark 21B, 
Peder Skrams Gade 9 3. tv.,
Willemoesgade 29 kid.

下面的 PCRE 正则表达式适用于上述业务场景。我需要使用此正则表达式并创建一个 Java 方法,该方法接受输入参数(单行地址)并返回正则表达式组(街道、门牌号和地址第 2 行)的输出。谁能帮我解决这个问题?

正则表达式:

/
\A\s*
(?: #########################################################################
# Option A: [<Addition to address 1>] <House number> <Street name> #
# [<Addition to address 2>] #
#########################################################################
(?:(?P<A_Addition_to_address_1>.*?),\s*)? # Addition to address 1
(?:No\.\s*)?
(?P<A_House_number_1>\pN+[a-zA-Z]?(?:\s*[-\/\pP]\s*\pN+[a-zA-Z]?)*) # House number
\s*,?\s*
(?P<A_Street_name_1>(?:[a-zA-Z]\s*|\pN\pL{2,}\s\pL)\S[^,#]*?(?<!\s)) # Street name
\s*(?:(?:[,\/]|(?=\#))\s*(?!\s*No\.)
(?P<A_Addition_to_address_2>(?!\s).*?))? # Addition to address 2
| #########################################################################
# Option B: [<Addition to address 1>] <Street name> <House number> #
# [<Addition to address 2>] #
#########################################################################
(?:(?P<B_Addition_to_address_1>.*?),\s*(?=.*[,\/]))? # Addition to address 1
(?!\s*No\.)(?P<B_Street_name>\S\s*\S(?:[^,#](?!\b\pN+\s))*?(?<!\s)) # Street name
\s*[\/,]?\s*(?:\sNo\.)?\s+
(?P<B_House_number>\pN+\s*-?[a-zA-Z]?(?:\s*[-\/\pP]?\s*\pN+(?:\s*[\-a-zA-Z])?)*|[IVXLCDM]+(?!.*\b\pN+\b))(?<!\s) # House number
\s*(?:(?:[,\/]|(?=\#)|\s)\s*(?!\s*No\.)\s*
(?P<B_Addition_to_address_2>(?!\s).*?))? # Addition to address 2
)
\s*\Z

https://regex101.com/library/lU7gY7

JAVA 方法:

    import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class regEx {
public static void main( String args[] ) {
// String to be scanned to find the pattern.
String line = "Bygholm Søpark 21B";
String pattern = "\\A\\s*\r\n" +
"(?: #########################################################################\r\n" +
" # Option A: [<Addition to address 1>] <House number> <Street name> #\r\n" +
" # [<Addition to address 2>] #\r\n" +
" #########################################################################\r\n" +
" (?:(?:P<A_Addition_to_address_1>.*?),\\s*)? # Addition to address 1\r\n" +
"(?:No\\.\\s*)?\r\n" +
" (?:P<A_House_number_1>\\pN+[a-zA-Z]?(?:\\s*[-\\/\\pP]\\s*\\pN+[a-zA-Z]?)*) # House number\r\n" +
"\\s*,?\\s*\r\n" +
" (?:P<A_Street_name_1>(?:[a-zA-Z]\\s*|\\pN\\pL{2,}\\s\\pL)\\S[^,#]*?(?<!\\s)) # Street name\r\n" +
"\\s*(?:(?:[,\\/]|(?=\\#))\\s*(?!\\s*No\\.)\r\n" +
" (?:P<A_Addition_to_address_2>(?!\\s).*?))? # Addition to address 2\r\n" +
"| #########################################################################\r\n" +
" # Option B: [<Addition to address 1>] <Street name> <House number> #\r\n" +
" # [<Addition to address 2>] #\r\n" +
" #########################################################################\r\n" +
" (?:(?:P<B_Addition_to_address_1>.*?),\\s*(?=.*[,\\/]))? # Addition to address 1\r\n" +
" (?:!\\s*No\\.)(?:P<B_Street_name>\\S\\s*\\S(?:[^,#](?!\\b\\pN+\\s))*?(?:<!\\s)) # Street name\r\n" +
"\\s*[\\/,]?\\s*(?:\\sNo\\.)?\\s+\r\n" +
" (?:P<B_House_number>\\pN+\\s*-?[a-zA-Z]?(?:\\s*[-\\/\\pP]?\\s*\\pN+(?:\\s*[\\-a-zA-Z])?)*|[IVXLCDM]+(?!.*\\b\\pN+\\b))(?<!\\s) # House number\r\n" +
"\\s*(?:(?:[,\\/]|(?=\\#)|\\s)\\s*(?!\\s*No\\.)\\s*\r\n" +
" (?:P<B_Addition_to_address_2>(?!\\s).*?))? # Addition to address 2\r\n" +
")\r\n" +
"\\s*\\Z";

// Create a Pattern object
Pattern r = Pattern.compile(pattern);

// Now create a matcher object.
Matcher m = r.matcher(line);
if (m.find( )) {
System.out.println("B_Street_name: " + m.group(1) );
System.out.println("B_House_number: " + m.group(2) );
System.out.println("B_Addition_to_address_2: " + m.group(3) );
}else {
System.out.println("NO MATCH");
}
}
}

最佳答案

有很多事情要记住。

  • 命名捕获组:Java 中的语法是 (?<name>pattern)名称只能由 ASCII 数字或字母组成(参见 I can't use a group name like this "abc_def" using Patterns )。全部替换 (?P<name_parts>...)(?<nameparts>...)
  • 使用 # :在除 Java 之外的许多风格中,自由间距模式允许使用文字 #在未转义的字符类中。在 Java 中,任何有意义的空格和 #即使在字符类中也必须进行转义(在字符类和模式中将所有 # 替换为 \\#)。
  • Pattern.COMMENTS在 Java 中用于启用自由间距/注释模式。或者,添加 (?x)在模式开始。

这是您的代码修复:

String line = "Bygholm Søpark 21B";
String pattern = "\\A\\s*\r\n" +
"(?: #########################################################################\r\n" +
" # Option A: [<Addition to address 1>] <House number> <Street name> #\r\n" +
" # [<Addition to address 2>] #\r\n" +
" #########################################################################\r\n" +
" (?:(?<AAdditiontoaddress1>.*?),\\s*)? # Addition to address 1\r\n" +
"(?:No\\.\\s*)?\r\n" +
" (?<AHousenumber1>\\pN+[a-zA-Z]?(?:\\s*[-/\\pP]\\s*\\pN+[a-zA-Z]?)*) # House number\r\n" +
"\\s*,?\\s*\r\n" +
" (?<AStreetname1>(?:[a-zA-Z]\\s*|\\pN\\pL{2,}\\s\\pL)\\S[^,\\#]*?(?<!\\s)) # Street name\r\n" +
"\\s*(?:(?:[,/]|(?=\\#))\\s*(?!\\s*No\\.)\r\n" +
" (?<AAdditiontoaddress2>(?!\\s).*?))? # Addition to address 2\r\n" +
"| #########################################################################\r\n" +
" # Option B: [<Addition to address 1>] <Street name> <House number> #\r\n" +
" # [<Addition to address 2>] #\r\n" +
" #########################################################################\r\n" +
" (?:(?<BAdditiontoaddress1>.*?),\\s*(?=.*[,/]))? # Addition to address 1\r\n" +
" (?!\\s*No\\.)(?<BStreetname>\\S\\s*\\S(?:[^,\\#](?!\\b\\pN+\\s))*?(?<!\\s)) # Street name\r\n" +
"\\s*[/,]?\\s*(?:\\sNo\\.)?\\s+\r\n" +
" (?<BHousenumber>\\pN+\\s*-?[a-zA-Z]?(?:\\s*[-/\\pP]?\\s*\\pN+(?:\\s*[-a-zA-Z])?)*|[IVXLCDM]+(?!.*\\b\\pN+\\b))(?<!\\s) # House number\r\n" +
"\\s*(?:(?:[,/]|(?=\\#)|\\s)\\s*(?!\\s*No\\.)\\s*\r\n" +
" (?<BAdditiontoaddress2>(?!\\s).*?))? # Addition to address 2\r\n" +
")\r\n" +
"\\s*\\Z";

// Create a Pattern object
Pattern r = Pattern.compile(pattern, Pattern.COMMENTS);
// Now create a matcher object.
Matcher m = r.matcher(line);
if (m.find()) {
System.out.println("B_Street_name: " + m.group("BStreetname") );
System.out.println("B_House_number: " + m.group("BHousenumber") );
System.out.println("B_Addition_to_address_2: " + m.group("BAdditiontoaddress2") );
} else {
System.out.println("NO MATCH");
}

参见 Java demo online .

输出:

B_Street_name: Bygholm Søpark
B_House_number: 21B
B_Addition_to_address_2: null

关于java - 将 PCRE 正则表达式修改为 C# 或 Java 支持的正则表达式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56040334/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com