gpt4 book ai didi

java - OpenNLP 语句训练示例

转载 作者:行者123 更新时间:2023-11-30 10:52:09 24 4
gpt4 key购买 nike

我正在尝试使用官方 OpenNLP 网站手册示例来训练新模型,示例如下:


Charset charset = Charset.forName("UTF-8");
ObjectStream lineStream = new PlainTextByLineStream(new FileInputStream("en-sent.train"), charset);
ObjectStream sampleStream = new SentenceSampleStream(lineStream);
SentenceModel model;
try {
model = SentenceDetectorME.train("en", sampleStream, true, null, TrainingParameters.defaultParams());
} finally {
sampleStream.close();
}
OutputStream modelOut = null;
try {
modelOut = new BufferedOutputStream(new FileOutputStream(modelFile));
model.serialize(modelOut);
} finally {
if (modelOut != null)
modelOut.close();
}

问题出在2º线,

    
ObjectStream lineStream = new PlainTextByLineStream(new FileInputStream("en-sent.train"), charset);

帮助说我:已弃用。请改用 PlainTextByLineStream(InputStreamFactory, Charset)。但是......我不知道如何使用这个构造函数。我想要一个使用相同语料库文件的未弃用构造函数的示例。

我已经使用 opennlp 帮助和 2 种使用 train 方法的方法编写了下一个代码,文档帮助中已弃用的和建议的:

    Charset charset = Charset.forName("UTF-8");
InputStreamFactory inputStreamFactory=null;
ObjectStream<String> lineStream=null;
ObjectStream<SentenceSample> sampleStream=null;
SentenceModel model=null;
OutputStream modelOut = null;
try{
inputStreamFactory=InputStreamFactory.class.newInstance();
lineStream=new PlainTextByLineStream(inputStreamFactory,charset);
sampleStream = new SentenceSampleStream(lineStream);
//The deprecated:
model = SentenceDetectorME.train("en", sampleStream, true, null, TrainingParameters.defaultParams());
//The sugested:
model = SentenceDetectorME.train("en", sampleStream, new SentenceDetectorFactory(), new TrainingParameters());
} catch (InstantiationException e2){
e2.printStackTrace();
} catch (IllegalAccessException e2){
e2.printStackTrace();
} catch (IOException e){
e.printStackTrace();
}finally {
try{
sampleStream.close();
} catch (IOException e){
e.printStackTrace();
}
}
try {
modelOut = new BufferedOutputStream(new FileOutputStream(new File("modelFile")));
model.serialize(modelOut);
} catch (FileNotFoundException e){
e.printStackTrace();
} catch (IOException e){
e.printStackTrace();
} finally {
if (modelOut != null) try{
modelOut.close();
} catch (IOException e){
e.printStackTrace();
}
}

但是在这个新代码中我不知道从哪里得到我的语料库数据文件。有什么想法吗?

最佳答案

你必须用你想要的数据文件初始化inputStreamFactory,使用

inputStreamFactory = new MarkableFileInputStreamFactory(
new File("en-sent.train"));

关于java - OpenNLP 语句训练示例,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34457087/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com