gpt4 book ai didi

java - 使用正则表达式返回单词位置

转载 作者:行者123 更新时间:2023-11-30 03:08:14 24 4
gpt4 key购买 nike

我在使用 java 中的正则表达式和匹配器方法返回单词位置时遇到问题。

假设我有一个句子“The Quick Brown Fox Jumps Over the Laziest Dog in the World”,在我当前的正则表达式中,我想返回特定单词的位置。

假设输入是“brown”,从上面的示例来看,它应该返回 3,这是句子中的第三个单词。如果它是“快速”,它应该返回 2,即句子中的第二个单词。如果是“world”,那么应该返回 12。我希望我已经给出了足够的例子。

我的尝试是

Pattern p= Pattern.compile("(?i)(?<=^|[^A-Z0-9a-z])enemy(?=$|[^A-Z0-9a-z])");
Matcher m = p.matcher("The quickman is an enemy from megaman.");
if(m.find()){
System.out.println(m.start());
System.out.println(m.end());
System.out.println(m.group());
}

但是 matcher.start() 只返回字符串的索引,即 16,而不是单词的位置。任何提示或帮助将不胜感激。

最佳答案

以下是单词brown的示例:

\b(?:(棕色)|(\S+))\b

Regular expression visualization

// \b(?:(brown)|(\S+))\b
//
// Options: Case sensitive; Exact spacing; Dot doesn’t match line breaks; ^$ don’t match at line breaks; Default line breaks
//
// Assert position at a word boundary (position preceded or followed—but not both—by a Unicode letter, digit, or underscore) «\b»
// Match the regular expression below «(?:(brown)|(\S+))»
// Match this alternative (attempting the next alternative only if this one fails) «(brown)»
// Match the regex below and capture its match into backreference number 1 «(brown)»
// Match the character string “brown” literally (case sensitive) «brown»
// Or match this alternative (the entire group fails if this one fails to match) «(\S+)»
// Match the regex below and capture its match into backreference number 2 «(\S+)»
// Match a single character that is NOT a “whitespace character” (ASCII space, tab, line feed, carriage return, vertical tab, form feed) «\S+»
// Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
// Assert position at a word boundary (position preceded or followed—but not both—by a Unicode letter, digit, or underscore) «\b»

查找棕色的示例程序:

import java.lang.Math;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.regex.PatternSyntaxException;


public class HelloWorld
{
public static void main(String[] args)
{
Integer counter = new Integer(0);
String subjectString = "The quick brown fox jumps over the laziest dog in the world";
String testWordString = "brown";
try {
Pattern regex = Pattern.compile("\\b(?:(brown)|(\\S+))\\b");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// here increment a count for each word we pass.
counter++;

// matched text: regexMatcher.group()
// match start: regexMatcher.start()
// match end: regexMatcher.end()

System.out.println(regexMatcher.group());

// if the word text `regexMatcher.group()` matches our subject word `brown` exit the loop.
if (testWordString.equals(regexMatcher.group())) {
System.out.println("found the word: " + counter);
break;
}

}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
}
}

输出:

The
quick
brown
found the word: 3
<小时/>

注意可以简化该示例,以删除对 brown 的显式测试:

\b(?:(棕色)|(\S+))\b

至:

\b(\S+)\b

但我的思考过程是允许您使用不同的正则表达式捕获组来指示您是否找到了匹配项,而不是每次都使用字符串比较brown

我将把它作为练习留给您。

关于java - 使用正则表达式返回单词位置,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34216036/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com