gpt4 book ai didi

java - 以下关于边界匹配器正则表达式的代码片段存在问题 (\b)

转载 作者:太空宇宙 更新时间:2023-11-04 11:00:21 25 4
gpt4 key购买 nike

我的输入:

 1. end 
2. end of the day or end of the week
3. endline
4. something
5. "something" end

根据上述讨论,如果我尝试使用此代码片段替换单个字符串,它会成功从该行中删除适当的单词

public class DeleteTest {

public static void main(String[] args) {

// TODO Auto-generated method stub
try {
File file = new File("C:/Java samples/myfile.txt");
File temp = File.createTempFile("myfile1", ".txt", file.getParentFile());
String delete="end";
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
PrintWriter writer = new PrintWriter(new OutputStreamWriter(new FileOutputStream(temp)));

for (String line; (line = reader.readLine()) != null;) {
line = line.replaceAll("\\b"+delete+"\\b", "");
writer.println(line);
}
reader.close();
writer.close();
}
catch (Exception e) {
System.out.println("Something went Wrong");
}
}
}

我的输出如果我使用上面的代码片段:(也是我的预期输出)

 1.  
2. of the day or of the week
3. endline
4. something
5. "something"

但是当我包含更多要删除的单词时,并且为此目的,当我使用 Set 时,我使用以下代码片段:

public static void main(String[] args) {

// TODO Auto-generated method stub
try {

File file = new File("C:/Java samples/myfile.txt");
File temp = File.createTempFile("myfile1", ".txt", file.getParentFile());
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
PrintWriter writer = new PrintWriter(new OutputStreamWriter(new FileOutputStream(temp)));

Set<String> toDelete = new HashSet<>();
toDelete.add("end");
toDelete.add("something");

for (String line; (line = reader.readLine()) != null;) {
line = line.replaceAll("\\b"+toDelete+"\\b", "");
writer.println(line);
}
reader.close();
writer.close();
}
catch (Exception e) {
System.out.println("Something went Wrong");
}
}

我的输出为:(它只是删除了空格)

 1. end
2. endofthedayorendoftheweek
3. endline
4. something
5. "something" end

你们能帮我解决这个问题吗?

Click here to follow the thread

最佳答案

您需要创建一个 alternation group出组与

String.join("|", toDelete)

并用作

line = line.replaceAll("\\b(?:"+String.join("|", toDelete)+")\\b", "");

图案看起来像

\b(?:end|something)\b

请参阅regex demo 。在这里,(?:...)是一个非捕获组,用于分组多个替代项,而不为捕获创建内存缓冲区(您不需要它,因为您删除了匹配项)。

或者,更好的是,在进入循环之前编译正则表达式:

Pattern pat = Pattern.compile("\\b(?:" + String.join("|", toDelete) + ")\\b");
...
line = pat.matcher(line).replaceAll("");

更新:

要允许匹配可能包含特殊字符的整个“单词”,您需要 Pattern.quote这些单词来转义那些特殊字符,然后您需要使用明确的单词边界,(?<!\w)而不是最初的 \b确保 and (?!\w) 之前没有单词 char负向前瞻而不是最终的 \b确保匹配后没有单词字符。

在 Java 8 中,您可以使用以下代码:

Set<String> nToDel = new HashSet<>();
nToDel = toDelete.stream()
.map(Pattern::quote)
.collect(Collectors.toCollection(HashSet::new));
String pattern = "(?<!\\w)(?:" + String.join("|", nToDel) + ")(?!\\w)";

正则表达式将类似于 (?<!\w)(?:\Q+end\E|\Qsomething-\E)(?!\w) 。请注意 \Q 之间的符号和\E被解析为文字符号

关于java - 以下关于边界匹配器正则表达式的代码片段存在问题 (\b),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46972476/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com