- c - 在位数组中找到第一个零
- linux - Unix 显示有关匹配两种模式之一的文件的信息
- 正则表达式替换多个文件
- linux - 隐藏来自 xtrace 的命令
我正在尝试使用 python nltk 的 wordnet 查找两个词之间的相似性。两个示例关键字是“游戏”和“莱昂纳多”。首先,我提取了这两个词的所有同义词集,并对每个同义词集进行交叉匹配以找出它们的相似性。这是我的代码
from nltk.corpus import wordnet as wn
xx = wn.synsets("game")
yy = wn.synsets("leonardo")
for x in xx:
for y in yy:
print x.name
print x.definition
print y.name
print y.definition
print x.wup_similarity(y)
print '\n'
这里是总输出:
game.n.01 a contest with rules to determine a winner leonardo.n.01Italian painter and sculptor and engineer and scientist and architect;the most versatile genius of the Italian Renaissance (1452-1519)0.285714285714
game.n.02 a single play of a sport or other contest leonardo.n.01Italian painter and sculptor and engineer and scientist and architect;the most versatile genius of the Italian Renaissance (1452-1519)0.285714285714
game.n.03 an amusement or pastime leonardo.n.01 Italian painter andsculptor and engineer and scientist and architect; the most versatilegenius of the Italian Renaissance (1452-1519)0.25
game.n.04 animal hunted for food or sport leonardo.n.01 Italianpainter and sculptor and engineer and scientist and architect; themost versatile genius of the Italian Renaissance (1452-1519)0.923076923077
game.n.05 (tennis) a division of play during which one player servesleonardo.n.01 Italian painter and sculptor and engineer and scientistand architect; the most versatile genius of the Italian Renaissance(1452-1519)0.222222222222
game.n.06 (games) the score at a particular point or the score neededto win leonardo.n.01 Italian painter and sculptor and engineer andscientist and architect; the most versatile genius of the ItalianRenaissance (1452-1519)0.285714285714
game.n.07 the flesh of wild animals that is used for foodleonardo.n.01 Italian painter and sculptor and engineer and scientistand architect; the most versatile genius of the Italian Renaissance(1452-1519)0.5
plot.n.01 a secret scheme to do something (especially somethingunderhand or illegal) leonardo.n.01 Italian painter and sculptor andengineer and scientist and architect; the most versatile genius of theItalian Renaissance (1452-1519)0.2
game.n.09 the game equipment needed in order to play a particular gameleonardo.n.01 Italian painter and sculptor and engineer and scientistand architect; the most versatile genius of the Italian Renaissance(1452-1519)0.666666666667
game.n.10 your occupation or line of work leonardo.n.01 Italianpainter and sculptor and engineer and scientist and architect; themost versatile genius of the Italian Renaissance (1452-1519)0.25
game.n.11 frivolous or trifling behavior leonardo.n.01 Italian painterand sculptor and engineer and scientist and architect; the mostversatile genius of the Italian Renaissance (1452-1519)0.222222222222
bet_on.v.01 place a bet on leonardo.n.01 Italian painter and sculptorand engineer and scientist and architect; the most versatile genius ofthe Italian Renaissance (1452-1519)-1
crippled.s.01 disabled in the feet or legs leonardo.n.01 Italianpainter and sculptor and engineer and scientist and architect; themost versatile genius of the Italian Renaissance (1452-1519)-1
game.s.02 willing to face danger leonardo.n.01 Italian painter andsculptor and engineer and scientist and architect; the most versatilegenius of the Italian Renaissance (1452-1519)-1
但是 game.n.04 和 leonardo.n.01 之间的相似性真的很奇怪。我认为相似度(0.923076923077)不应该这么高。
game.n.04
animal hunted for food or sport
leonardo.n.01
Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519)
0.923076923077
我的概念有问题吗?
最佳答案
根据 the docs , wup_similarity()
方法返回...
...a score denoting how similar two word senses are, based on the depth of the two senses in the taxonomy and that of their Least Common Subsumer (most specific ancestor node).
...和...
>>> from nltk.corpus import wordnet as wn
>>> game = wn.synset('game.n.04')
>>> leonardo = wn.synset('leonardo.n.01')
>>> game.lowest_common_hypernyms(leonardo)
[Synset('organism.n.01')]
>>> organism = game.lowest_common_hypernyms(leonardo)[0]
>>> game.shortest_path_distance(organism)
2
>>> leonardo.shortest_path_distance(organism)
3
...这就是它认为它们相似的原因,尽管我明白...
>>> game.wup_similarity(leonardo)
0.7058823529411765
...由于某种原因这是不同的。
更新
I want some measurement which will show that dissimilarity('game', 'chess') is much much less than dissimilarity('game', 'leonardo')
这样的事情怎么样...
from nltk.corpus import wordnet as wn
from itertools import product
def compare(word1, word2):
ss1 = wn.synsets(word1)
ss2 = wn.synsets(word2)
return max(s1.path_similarity(s2) for (s1, s2) in product(ss1, ss2))
for word1, word2 in (('game', 'leonardo'), ('game', 'chess')):
print "Path similarity of %-10s and %-10s is %.2f" % (word1,
word2,
compare(word1, word2))
...打印...
Path similarity of game and leonardo is 0.17
Path similarity of game and chess is 0.25
关于python nltk 为 wordnet 相似性度量返回奇数结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17296588/
我需要在基于 Java 的应用程序中使用 Wordnet。我想: 搜索同义词集 找到同义词集之间的相似性/相关性 我的应用程序使用 RDF 图,我知道 Wordnet 有 SPARQL 端点,但我想最
假设我们有一个 IEnumerable Collection,其中包含 20 000 人 对象项。那么假设我们创建了另一个 Person 对象。 我们想列出所有与这个人相似的人。这意味着,例如,如果姓
我使用 JAWS 作为普通的 wordnet 来查找单词之间的相似性。 我安装了 wordnet 2.1 并添加了 jar 文件:edu.mit.jwi_2.1.4.jar 和 edu.sussex.
我用这段代码做了一个词嵌入: with open("text.txt",'r') as longFile: sentences = [] single= []
我正在尝试找出确定各种对象或数组之间的共性或相似性的最佳方法,并且有兴趣获得社区的意见。我目前正在用 javascript 构建一个早期研究原型(prototype),我需要采用一种巧妙的方式来比较对
我在将 Flash 游戏转换为 C# 时遇到问题。在 Flash 中我会使用这种语法: public function doMove() { eaze(this).to(actionTime,
我有一批形状为 (bs, m, n) 的向量(即维度为 mxn 的 bs 向量)。对于每个批处理,我想计算第一个向量与其余 (m-1) 个向量的 Jaccard 相似度 例子: a = [ [
如何使用 Whoosh 获取文档的相似性度量? 我想创建一个“相关”特征,对与文档具有高度相似性的其他先前编入索引的文档进行排名。 我是否将文档作为长查询字符串输入?我是否将文档添加到索引并以某种方式
我编写了一个 Python 函数,它接受两个列表,使用 Levenshtein 比较它们并将足够相似的单词合并到一个名为“merged”的列表中。 我如何为超过 6 个列表执行此操作?确保将每个列表与
请原谅我对 Go 的了解非常有限。我有这样的定义 type ErrorVal int const ( LEV_ERROR ErrorVal = iota LEV_WARNING
我正在从事文本分析项目,一次比较两个不同的报告并将结果保存到 pandas 数据框中。 我能够得到 cosine 和 jacard 的相似性,但需要确保我得到正确的度量。作为参数,我使用位于给定文件夹
我是一名优秀的程序员,十分优秀!