gpt4 book ai didi

python nltk 为 wordnet 相似性度量返回奇数结果

转载 作者:太空狗 更新时间:2023-10-30 00:50:51 25 4
gpt4 key购买 nike

我正在尝试使用 python nltk 的 wordnet 查找两个词之间的相似性。两个示例关键字是“游戏”和“莱昂纳多”。首先,我提取了这两个词的所有同义词集,并对每个同义词集进行交叉匹配以找出它们的相似性。这是我的代码

from nltk.corpus import wordnet as wn

xx = wn.synsets("game")
yy = wn.synsets("leonardo")
for x in xx:
for y in yy:
print x.name
print x.definition
print y.name
print y.definition
print x.wup_similarity(y)
print '\n'

这里是总输出:

game.n.01 a contest with rules to determine a winner leonardo.n.01Italian painter and sculptor and engineer and scientist and architect;the most versatile genius of the Italian Renaissance (1452-1519)0.285714285714

game.n.02 a single play of a sport or other contest leonardo.n.01Italian painter and sculptor and engineer and scientist and architect;the most versatile genius of the Italian Renaissance (1452-1519)0.285714285714

game.n.03 an amusement or pastime leonardo.n.01 Italian painter andsculptor and engineer and scientist and architect; the most versatilegenius of the Italian Renaissance (1452-1519)0.25

game.n.04 animal hunted for food or sport leonardo.n.01 Italianpainter and sculptor and engineer and scientist and architect; themost versatile genius of the Italian Renaissance (1452-1519)0.923076923077

game.n.05 (tennis) a division of play during which one player servesleonardo.n.01 Italian painter and sculptor and engineer and scientistand architect; the most versatile genius of the Italian Renaissance(1452-1519)0.222222222222

game.n.06 (games) the score at a particular point or the score neededto win leonardo.n.01 Italian painter and sculptor and engineer andscientist and architect; the most versatile genius of the ItalianRenaissance (1452-1519)0.285714285714

game.n.07 the flesh of wild animals that is used for foodleonardo.n.01 Italian painter and sculptor and engineer and scientistand architect; the most versatile genius of the Italian Renaissance(1452-1519)0.5

plot.n.01 a secret scheme to do something (especially somethingunderhand or illegal) leonardo.n.01 Italian painter and sculptor andengineer and scientist and architect; the most versatile genius of theItalian Renaissance (1452-1519)0.2

game.n.09 the game equipment needed in order to play a particular gameleonardo.n.01 Italian painter and sculptor and engineer and scientistand architect; the most versatile genius of the Italian Renaissance(1452-1519)0.666666666667

game.n.10 your occupation or line of work leonardo.n.01 Italianpainter and sculptor and engineer and scientist and architect; themost versatile genius of the Italian Renaissance (1452-1519)0.25

game.n.11 frivolous or trifling behavior leonardo.n.01 Italian painterand sculptor and engineer and scientist and architect; the mostversatile genius of the Italian Renaissance (1452-1519)0.222222222222

bet_on.v.01 place a bet on leonardo.n.01 Italian painter and sculptorand engineer and scientist and architect; the most versatile genius ofthe Italian Renaissance (1452-1519)-1

crippled.s.01 disabled in the feet or legs leonardo.n.01 Italianpainter and sculptor and engineer and scientist and architect; themost versatile genius of the Italian Renaissance (1452-1519)-1

game.s.02 willing to face danger leonardo.n.01 Italian painter andsculptor and engineer and scientist and architect; the most versatilegenius of the Italian Renaissance (1452-1519)-1

但是 game.n.04 和 leonardo.n.01 之间的相似性真的很奇怪。我认为相似度(0.923076923077)不应该这么高。

game.n.04

animal hunted for food or sport

leonardo.n.01

Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519)

0.923076923077

我的概念有问题吗?

最佳答案

根据 the docs , wup_similarity() 方法返回...

...a score denoting how similar two word senses are, based on the depth of the two senses in the taxonomy and that of their Least Common Subsumer (most specific ancestor node).

...和...

>>> from nltk.corpus import wordnet as wn
>>> game = wn.synset('game.n.04')
>>> leonardo = wn.synset('leonardo.n.01')
>>> game.lowest_common_hypernyms(leonardo)
[Synset('organism.n.01')]
>>> organism = game.lowest_common_hypernyms(leonardo)[0]
>>> game.shortest_path_distance(organism)
2
>>> leonardo.shortest_path_distance(organism)
3

...这就是它认为它们相似的原因,尽管我明白...

>>> game.wup_similarity(leonardo)
0.7058823529411765

...由于某种原因这是不同的。


更新

I want some measurement which will show that dissimilarity('game', 'chess') is much much less than dissimilarity('game', 'leonardo')

这样的事情怎么样...

from nltk.corpus import wordnet as wn
from itertools import product

def compare(word1, word2):
ss1 = wn.synsets(word1)
ss2 = wn.synsets(word2)
return max(s1.path_similarity(s2) for (s1, s2) in product(ss1, ss2))

for word1, word2 in (('game', 'leonardo'), ('game', 'chess')):
print "Path similarity of %-10s and %-10s is %.2f" % (word1,
word2,
compare(word1, word2))

...打印...

Path similarity of game       and leonardo   is 0.17
Path similarity of game and chess is 0.25

关于python nltk 为 wordnet 相似性度量返回奇数结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17296588/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com