gpt4 book ai didi

java - FileInputStream 只读取文件中的第一个单词

转载 作者:行者123 更新时间:2023-12-01 11:04:46 26 4
gpt4 key购买 nike

我想逐个标记地读取 file.txt 文件中的单词,并向每个标记添加词性标记,然后将其写入 file2.text 文件。 file.txt 内容已标记化。这是我的代码。

public class PoSTagging {
@SuppressWarnings("resource")
public static void PoStagMethod() throws IOException {

FileInputStream fin= new FileInputStream("C:\\Users\\dell\\Desktop\\file.txt");
DataInputStream in = new DataInputStream(fin);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strline=br.readLine();
System.out.println(strline+"first");

try{
POSModel model = new POSModelLoader().load(new File("en-pos-maxent.bin"));
PerformanceMonitor perfMon = new PerformanceMonitor(System.err, "sent");
POSTaggerME tagger = new POSTaggerME(model);

String input = strline;
@SuppressWarnings("deprecation")
ObjectStream<String> lineStream =new PlainTextByLineStream(new StringReader(input));

perfMon.start();
String line;
while ((line = lineStream.read()) != null) {

String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE.tokenize(line);
String[] tags = tagger.tag(whitespaceTokenizerLine);

POSSample sample = new POSSample(whitespaceTokenizerLine, tags);
System.out.println(sample.toString()+"second");
//String t=sample.toString();

FileOutputStream fout=new FileOutputStream("C:\\Users\\dell\\Desktop\\file2.txt");
//fout.write(t.getBytes());

perfMon.incrementCounter();
fout.close();
}
perfMon.stopAndPrintFinalResult();
}
catch (IOException e) {
e.printStackTrace();
}
}
}

当从另一个类调用 PoStagMethod() 时,只有 file.txt 文件中的第一个单词会写入 file2.txt文件。为什么它不读取文件中的其他单词?我的代码有什么问题吗?

最佳答案

您可以使用 BufferedReader 逐行读取 file.txt。然后使用 POSModel 处理每一行,然后使用 BufferedWriter 将输出写入 file2.txt。下面的代码片段可能会有所帮助:

    POSModel model = new POSModelLoader().load(new File("en-pos-maxent.bin"));
PerformanceMonitor perfMon = new PerformanceMonitor(System.err, "sent");
POSTaggerME tagger = new POSTaggerME(model);

BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter("C:\\Users\\dell\\Desktop\\file2.txt"));

BufferedReader bufferedReader = new BufferedReader(new FileReader("C:\\Users\\dell\\Desktop\\file.txt"));
String line = "";
while((line = bufferedReader.readLine()) != null){
String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE.tokenize(line);
String[] tags = tagger.tag(whitespaceTokenizerLine);
// Do your work with your tags and tokenized words


bufferedWriter.write(/* the string which is needed to be written to your output */);
// for adding new-lines in the output file, uncomment the following line:
//bufferedWriter.newLine();
}

//Do not forget to flush() and close() the streams after your job is done:
bufferedWriter.flush();
bufferedWriter.close();
bufferedReader.close();

如果你能做到这一点,用 java 1.7 中添加的用于自动关闭资源的 try-with-resource 替换老式的 try-catch 子句也不错。

此外,如果您需要在单独的行中写入每个单词及其标签,您可能需要一个内部循环来写入文件。它会像下面这样:

    POSModel model = new POSModelLoader().load(new File("en-pos-maxent.bin"));
PerformanceMonitor perfMon = new PerformanceMonitor(System.err, "sent");
POSTaggerME tagger = new POSTaggerME(model);

BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter("C:\\Users\\dell\\Desktop\\file2.txt"));

BufferedReader bufferedReader = new BufferedReader(new FileReader("C:\\Users\\dell\\Desktop\\file.txt"));
String line = "";
while((line = bufferedReader.readLine()) != null){
String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE.tokenize(line);
String[] tags = tagger.tag(whitespaceTokenizerLine);
for(String word: whitespaceTokenizerLine){

// Do your work with your tags and tokenized words

bufferedWriter.write(/* the string which is needed to be written to your output */);
// for adding new-lines in the output file, uncomment the following line:
//bufferedWriter.newLine();
}
}

//Do not forget to flush() and close() the streams after your job is done:
bufferedWriter.flush();
bufferedWriter.close();
bufferedReader.close();

希望这会有所帮助,

祝你好运。

关于java - FileInputStream 只读取文件中的第一个单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33064915/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com