gpt4 book ai didi

java - 提取包含特定单词的句子

转载 作者:行者123 更新时间:2023-12-01 22:55:50 28 4
gpt4 key购买 nike

我想获取文本文件中包含特定关键字的句子。我尝试了很多,但无法获得包含关键字的正确句子......我有不止一组关键字,如果其中任何一个与该段落匹配,则应该采用它。例如:如果我的文本文件包含抢劫、抢劫等单词,那么应该提取该句子。下面是我尝试过的代码。无论如何,有没有使用正则表达式来解决这个问题。任何帮助将不胜感激。

  BufferedReader br1 = new BufferedReader(new FileReader("/home/pgrms/Documents/test/one.txt"));
String str="";

while(br1 .ready())
{
str+=br1 .readLine() +"\n";

}
Pattern re = Pattern.compile("[^.!?\\s][^.!?]*(?:[.!?](?!['\"]?\\s|$)[^.!?]*)*[.!?]?['\"]?(?=\\s|$)", Pattern.MULTILINE | Pattern.COMMENTS);
Matcher match = re.matcher(str);
String sentenceString="";
while (match .find())
{
sentenceString=match.group(0);
System.out.println(sentenceString);
}

最佳答案

以下是当您有预定义关键字列表时的示例:

import java.io.BufferedReader;
import java.io.FileReader;
import java.util.regex.*;
public class Tester {

public static void main(String [] args){
try {
BufferedReader br1 = new BufferedReader(new FileReader("input"));
String[] words = {"robbery","robbed", "robbers"};
String word_re = words[0];
String str="";

for (int i = 1; i < words.length; i++)
word_re += "|" + words[i];
word_re = "[^.]*\\b(" + word_re + ")\\b[^.]*[.]";
while(br1.ready()) { str += br1.readLine(); }
Pattern re = Pattern.compile(word_re,
Pattern.MULTILINE | Pattern.COMMENTS |
Pattern.CASE_INSENSITIVE);
Matcher match = re.matcher(str);
String sentenceString="";
while (match .find()) {
sentenceString = match.group(0);
System.out.println(sentenceString);
}
} catch (Exception e) {}
}

}

这将创建以下形式的正则表达式:

[^.]*\b(robbery|robbed|robbers)\b[^.]*[.]

关于java - 提取包含特定单词的句子,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24074650/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com