gpt4 book ai didi

java - 使用 wordnet 进行相似度测量?

转载 作者:行者123 更新时间:2023-12-01 15:05:52 25 4
gpt4 key购买 nike

我正在使用 wordnet 来计算两个单词之间的相似度测量。我正在使用 edu.mit.jwi_2.1.4.jaredu.sussex.nlp.jws.beta.11.jar 但是当我计算单词“apple”时而“香蕉”,通过雷斯尼克测量,是8,4。为什么大于1?



public class test {
String dir = "C:/Program Files (x86)/WordNet";
JWS ws = new JWS(dir,"2.1");
/**
* @param args
*/
public void testResnikSimilarity() {
Resnik jcn = ws.getResnik();
System.out.println("Resnik");
// all senses
TreeMap scores1 = jcn.res("apple", "banana", "n"); // all senses
//TreeMap scores1 = jcn.jcn("apple", 1, "banana", "n");
// fixed;all
//TreeMap scores1 = jcn.jcn("apple", "banana", 2, "n");
// all;fixed
for(String s : scores1.keySet())
System.out.println(s + "\t" + scores1.get(s));
// specific senses
System.out.println("\nspecific pair\t=\t" + jcn.res("apple", 1, "banana",
1, "n") + "\n");
// max.
System.out.println("\nhighest score\t=\t" + jcn.max("apple", "banana",
"n") + "\n\n\n");
}
}


最佳答案

引用NLTK Documentation :

Resnik Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node). Note that for any similarity measure that uses information content, the result is dependent on the corpus used to generate the information content and the specifics of how the information content was created.

我不知道如何设置JWS中的信息内容。在 NLTK 中,您可以使用来自 Brown Corpus 和 BNC 的数据执行以下操作:

ic = wordnet_ic.ic('ic-brown.dat')
banana.res_similarity(apple, ic=ic)
>>> 8.1703339116227411
ic = wordnet_ic.ic('ic-bnc.dat')
banana.res_similarity(apple, ic=ic)
>>> 7.9753635531935334

另请参阅 paper了解详情。

关于java - 使用 wordnet 进行相似度测量?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12967153/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com