gpt4 book ai didi

java - Java中的正则表达式用于查找重复的连续单词

转载 作者:搜寻专家 更新时间:2023-10-30 21:14:50 24 4
gpt4 key购买 nike

我将此视为在字符串中查找重复单词的答案。但是当我使用它时,它认为 Thisis 是相同的并删除了 is

正则表达式

"\\b(\\w+)\\b\\s+\\1"

知道为什么会这样吗?

这是我用来删除重复项的代码

public static String RemoveDuplicateWords(String input)
{
String originalText = input;
String output = "";
Pattern p = Pattern.compile("\b(\w+)\b\s+\b\1\b", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE);
//Pattern p = Pattern.compile("\\b(\\w+)\\b\\s+\\1", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(input);
if (!m.find())
output = "No duplicates found, no changes made to data";
else
{
while (m.find())
{
if (output == "")
output = input.replaceFirst(m.group(), m.group(1));
else
output = output.replaceAll(m.group(), m.group(1));
}
input = output;
m = p.matcher(input);
while (m.find())
{
output = "";
if (output == "")
output = input.replaceAll(m.group(), m.group(1));
else
output = output.replaceAll(m.group(), m.group(1));
}
}
return output;
}

最佳答案

试试这个:

String pattern = "(?i)\\b([a-z]+)\\b(?:\\s+\\1\\b)+";
Pattern r = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);

String input = "your string";
Matcher m = r.matcher(input);
while (m.find()) {
input = input.replaceAll(m.group(), m.group(1));
}
System.out.println(input);

Java 正则表达式在 API documentation of the Pattern class 中有很好的解释。 .添加一些空格以指示正则表达式的不同部分后:

"(?i) \\b ([a-z]+) \\b (?: \\s+ \\1 \\b )+"

\b match a word boundary
[a-z]+ match a word with one or more characters;
the parentheses capture the word as a group
\b match a word boundary
(?: indicates a non-capturing group (which starts here)
\s+ match one or more white space characters
\1 is a back reference to the first (captured) group;
so the word is repeated here
\b match a word boundary
)+ indicates the end of the non-capturing group and
allows it to occur one or more times

关于java - Java中的正则表达式用于查找重复的连续单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9147270/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com