gpt4 book ai didi

java - 遗漏了正常工作的 Regex 应该找到的某些字符串,我需要帮助确定原因

转载 作者:行者123 更新时间:2023-11-30 11:04:37 26 4
gpt4 key购买 nike

我有一组字符串,我循环遍历这些字符串,根据下面的一组正则表达式检查这些字符串,以尝试将第一个小部分与字符串的其余部分分开。正则表达式几乎适用于所有情况,但不幸的是我不知道为什么它偶尔会失败。如果找到模式,我一直在使用模式匹配器打印出字符串。

两个示例工作字符串:

98. SORGHUM Moench - Millets Annuals or rhizomatous perennials; inflorescence …
99. MISCANTHUS Andersson - Silver-grasses Rhizomatous perennials; inflorescence …

两个失败字符串示例:

100. ZEA L. - Maize Annuals; male and female inflorescences separate, the …
26. POA L. (Parodiochloa C.E. Hubb.) - Meadow-grasses Annuals or perennials with or without stolons or rhizomes; sheaths overlapping or some …

到目前为止使用的正则表达式:

Pattern endOfGenus = Pattern.compile("(?<=(^\\d+\\. " + genusNames[l].toUpperCase() + "))");
Pattern endOfGenusTwo = Pattern.compile("(?<=(^\\d+" + genusNames[l].toUpperCase() + "))");
Pattern endOfGenusThree = Pattern.compile("(?<=(\\d+\\. " + genusNames[l] + "))");
Pattern endOfGenusFour = Pattern.compile("(?<=(\\d+" + genusNames[l] + "))");
Pattern endOfGenusFive = Pattern.compile("(?<=(\\. " + genusNames[l] + "))");

其中第一个是迄今为止产生可靠结果的那个。

示例代码

Pattern endOfGenus = Pattern.compile("(?<=(^\\d+\\. " + genusNames[l].toUpperCase() + "))");
Matcher endOfGenusFinder = endOfGenus.matcher(descriptionPartBits[b]);
if (endOfGenusFinder.find()) {
System.out.print(descriptionPartBits[b] + ":- ");
System.out.print(genusNames[l] + "\n");
String[] genusNameBits = descriptionPartBits[b].split("(?<=(^\\d+\\. " + genusNames[l].toUpperCase() + "))");
}

期望的输出。这是由有效的字符串产生的。不起作用的字符串根本不会出现在输出中:

98. SORGHUM Moench - Millets Annuals or rhizomatous perennials:- Sorghum
99. MISCANTHUS Andersson - Silver-grasses Rhizomatous perennials:- Miscanthus

最佳答案

来自 regex tutorial :

Lookahead and lookbehind, collectively called "lookaround", are zero-length assertions just like the start and end of line, and start and end of word anchors explained earlier in this tutorial.

Lookahead 和 lookbehind 只返回 true 或 false。所以我更改了您的代码示例:

    Pattern endOfGenus = Pattern.compile("(?<=(^\\d+\\. ZEA L))(.+)$");
// Matcher matcher = endOfGenus.matcher("98. SORGHUM Moench - Millets Annuals or rhizomatous perennials; inflorescence …");
Matcher matcher = endOfGenus.matcher("100. ZEA L. - Maize Annuals; male and female inflorescences separate, the …");
while (matcher.find()) {
String group1 = matcher.group(1);
String group2 = matcher.group(2);
System.out.println("group1=" + group1);
System.out.println("group2=" + group2);
}

第 1 组与 (^\\d+\\.ZEA L) 匹配。第 2 组与 (.+) 匹配。

关于java - 遗漏了正常工作的 Regex 应该找到的某些字符串,我需要帮助确定原因,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29973488/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com