gpt4 book ai didi

java - 如何使用java对属于特定区域的字符串进行分类?

转载 作者:行者123 更新时间:2023-12-01 13:32:09 26 4
gpt4 key购买 nike

简短介绍
我从一组 PDF 文件中提取了一堆文本。这些文本是文档的标题。

我的目标是根据标题中出现的术语对标题进行分类。也就是说,如果标题包含 Car 那么它必须被分类为 automobile

我的目标示例

想象以下标题:

1) DISTRIBUTED MESH NETWORK
2) MONITORING A SELF-CONTAINED SERVER RACK SYSTEM
3)SIDE PANEL FOR AN AUTOMOBILE
4) LOCATION-BASED VEHICLE MESSAGING SYSTEM

现在,上述标题必须归类为

1st title contains term Network , So classify as Networking
2nd title contains term Server, So classify as Networking
3rd title contains term automobile, So classify as automobile
4th title contains term vehicle , so classify as automobile

这就是我需要的。

我的作品

为了实现我的目标,我在文本文件中为每个类别创建了一个术语索引,并将其与标题相匹配。如果它包含文本文件中的单词,则标题会被分类。

例如

Automobile.txtcar、gear、wheel、clutch
networking.txt服务器、IP 地址、TCP、RIP

这是算法:

String Classify (String title)
{
String area;
if (compareWordsFrom ("Automobile.txt",title) == true ) area = "Auto";
if (compareWordsFrom ("Netoworking.txt",title) == true ) area = "Networking";
if (compareWordsFrom ("metels.txt",title) == true ) area = "Metallurgy";
return area;
}

我的问题
我的问题是,很难找到相关的词来建立索引。也就是说,汽车领域有1000个很难找到的相关术语。

准确地说,手动构建术语索引是一个令人心碎的过程。

我的需求
我的工作需要一种自动化的方式。自然语言处理技术能够做到这一点。 ?或者我有一个现成的库可用吗?

最佳答案

http://en.wikipedia.org/wiki/WordNet

WordNet is a lexical database for the English language. It groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these synonym sets. The purpose is twofold: to produce a combination of dictionary and thesaurus that is more intuitively usable, and to support automatic text analysis and artificial intelligence applications. The database and software tools have been released under a BSD style license and can be downloaded and used freely. The database can also be browsed online.

WordNet:http://wordnet.princeton.edu/

关于java - 如何使用java对属于特定区域的字符串进行分类?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21511189/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com