gpt4 book ai didi

java - 如何在 Java 中使用 OpenNLP?

转载 作者:IT老高 更新时间:2023-10-28 21:04:32 25 4
gpt4 key购买 nike

我想对一个英文句子进行 POSTtag 并进行一些处理。我想使用openNLP。我已经安装了

当我执行命令时

I:\Workshop\Programming\nlp\opennlp-tools-1.5.0-bin\opennlp-tools-1.5.0>java -jar opennlp-tools-1.5.0.jar POSTagger models\en-pos-maxent.bin < Text.txt

它给出输出POSTagging Text.txt中的输入

    Loading POS Tagger model ... done (4.009s)
My_PRP$ name_NN is_VBZ Shabab_NNP i_FW am_VBP 22_CD years_NNS old._.


Average: 66.7 sent/s
Total: 1 sent
Runtime: 0.015s

我希望它安装正确?

现在我如何从 Java 应用程序内部进行 POSTtagging?我已将 openNLPtools、jwnl、maxent jar 添加到项目中,但如何调用 POSTtagging?

最佳答案

这是我拼凑起来的一些(旧)示例代码,后面还有现代化的代码:

package opennlp;

import opennlp.tools.cmdline.PerformanceMonitor;
import opennlp.tools.cmdline.postag.POSModelLoader;
import opennlp.tools.postag.POSModel;
import opennlp.tools.postag.POSSample;
import opennlp.tools.postag.POSTaggerME;
import opennlp.tools.tokenize.WhitespaceTokenizer;
import opennlp.tools.util.ObjectStream;
import opennlp.tools.util.PlainTextByLineStream;

import java.io.File;
import java.io.IOException;
import java.io.StringReader;

public class OpenNlpTest {
public static void main(String[] args) throws IOException {
POSModel model = new POSModelLoader().load(new File("en-pos-maxent.bin"));
PerformanceMonitor perfMon = new PerformanceMonitor(System.err, "sent");
POSTaggerME tagger = new POSTaggerME(model);

String input = "Can anyone help me dig through OpenNLP's horrible documentation?";
ObjectStream<String> lineStream =
new PlainTextByLineStream(new StringReader(input));

perfMon.start();
String line;
while ((line = lineStream.read()) != null) {

String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE.tokenize(line);
String[] tags = tagger.tag(whitespaceTokenizerLine);

POSSample sample = new POSSample(whitespaceTokenizerLine, tags);
System.out.println(sample.toString());

perfMon.incrementCounter();
}
perfMon.stopAndPrintFinalResult();
}
}

输出是:

Loading POS Tagger model ... done (2.045s)
Can_MD anyone_NN help_VB me_PRP dig_VB through_IN OpenNLP's_NNP horrible_JJ documentation?_NN

Average: 76.9 sent/s
Total: 1 sent
Runtime: 0.013s

这基本上是在作为 OpenNLP 一部分包含的 POSTaggerTool 类中工作的。 sample.getTags() 是一个 String 数组,其中包含标签类型本身。

这需要直接访问训练数据的文件,这真的很糟糕。

为此更新的代码库略有不同(并且可能更有用。)

首先,一个 Maven POM:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>org.javachannel</groupId>
<artifactId>opennlp-example</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.opennlp</groupId>
<artifactId>opennlp-tools</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>org.testng</groupId>
<artifactId>testng</artifactId>
<version>[6.8.21,)</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
</plugins>
</build>
</project>

这是作为测试编写的代码,因此位于 ./src/test/java/org/javachannel/opennlp/example:

package org.javachannel.opennlp.example;

import opennlp.tools.cmdline.PerformanceMonitor;
import opennlp.tools.postag.POSModel;
import opennlp.tools.postag.POSSample;
import opennlp.tools.postag.POSTaggerME;
import opennlp.tools.tokenize.WhitespaceTokenizer;
import org.testng.annotations.DataProvider;
import org.testng.annotations.Test;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.URL;
import java.nio.channels.Channels;
import java.nio.channels.ReadableByteChannel;
import java.util.stream.Stream;

public class POSTest {
private void download(String url, File destination) throws IOException {
URL website = new URL(url);
ReadableByteChannel rbc = Channels.newChannel(website.openStream());
FileOutputStream fos = new FileOutputStream(destination);
fos.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE);
}

@DataProvider
Object[][] getCorpusData() {
return new Object[][][]{{{
"Can anyone help me dig through OpenNLP's horrible documentation?"
}}};
}

@Test(dataProvider = "getCorpusData")
public void showPOS(Object[] input) throws IOException {
File modelFile = new File("en-pos-maxent.bin");
if (!modelFile.exists()) {
System.out.println("Downloading model.");
download("http://opennlp.sourceforge.net/models-1.5/en-pos-maxent.bin", modelFile);
}
POSModel model = new POSModel(modelFile);
PerformanceMonitor perfMon = new PerformanceMonitor(System.err, "sent");
POSTaggerME tagger = new POSTaggerME(model);

perfMon.start();
Stream.of(input).map(line -> {
String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE.tokenize(line.toString());
String[] tags = tagger.tag(whitespaceTokenizerLine);

POSSample sample = new POSSample(whitespaceTokenizerLine, tags);

perfMon.incrementCounter();
return sample.toString();
}).forEach(System.out::println);
perfMon.stopAndPrintFinalResult();
}
}

这段代码实际上并没有测试任何东西——它是一个冒烟测试,如果有的话——但它应该作为一个起点。另一个(可能)好的事情是,如果您还没有下载模型,它会为您下载模型。

关于java - 如何在 Java 中使用 OpenNLP?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5836148/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com