opennlp - 打开 NLP Name Finder 训练-6ren

opennlp - 打开 NLP Name Finder 训练

转载作者：行者123 更新时间：2023-12-04 23:06:16

24

4

我正在根据在线手册 (http://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html) 构建一个 15k 行的训练数据文档，名为:en-ner-person.train。

我的问题是:在我的培训文件中，我是否包含完整的报告？或者我只包含具有名称的行:<START:person> John Smith <END> ?

例如，我是否在训练数据中使用了整个报告:

<START:person> Pierre Vinken <END> , 61 years old , will join the board as a nonexecutive director Nov. 29 .
A nonexecutive  director has many similar responsibilities as an executive director.
However, there are no voting rights with this position.
Mr . <START:person> Vinken <END> is chairman of Elsevier N.V. , the Dutch publishing group .

或者我是否只在我的培训文档中包含这两行:

<START:person> Pierre Vinken <END> , 61 years old , will join the board as a nonexecutive director Nov. 29 .
Mr . <START:person> Vinken <END> is chairman of Elsevier N.V. , the Dutch publishing group .

最佳答案

您应该使用整个报告。这将有助于系统了解何时不标记实体，从而提高假阴性分数。

您可以使用 evaluation tool 测量它.保留语料库中的一些句子用于测试，例如总数的 1/10，并使用其他 9/10 的句子训练您的模型。您可以尝试使用整个报告进行训练，而另一个仅使用带有名称的句子进行训练。结果将以 precision and recall 表示.

请记住将测试样本与整个报告一起保存，而不仅仅是带有名称的句子，否则您将无法准确衡量模型在没有名称的句子中的表现。

关于opennlp - 打开 NLP Name Finder 训练，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/11335013/

24

4

0

文章推荐： sql - SQL 中的 <> 运算符

文章推荐： python - 在 python 语法错误中使用 ffmpeg 调整大小

首页

博学

6Ren·AI

商城

opennlp - 打开 NLP Name Finder 训练