python nltk 为 wordnet 相似性度量返回奇数结果-6ren

python nltk 为 wordnet 相似性度量返回奇数结果

转载作者：太空狗更新时间：2023-10-30 00:50:51

25

4

我正在尝试使用 python nltk 的 wordnet 查找两个词之间的相似性。两个示例关键字是“游戏”和“莱昂纳多”。首先，我提取了这两个词的所有同义词集，并对每个同义词集进行交叉匹配以找出它们的相似性。这是我的代码

from nltk.corpus import wordnet as wn

xx = wn.synsets("game")
yy = wn.synsets("leonardo")
for x in xx:
    for y in yy:
        print x.name
        print x.definition
        print y.name
        print y.definition
        print x.wup_similarity(y)
        print '\n'

这里是总输出:

game.n.01 a contest with rules to determine a winner leonardo.n.01Italian painter and sculptor and engineer and scientist and architect;the most versatile genius of the Italian Renaissance (1452-1519)0.285714285714

game.n.02 a single play of a sport or other contest leonardo.n.01Italian painter and sculptor and engineer and scientist and architect;the most versatile genius of the Italian Renaissance (1452-1519)0.285714285714

game.n.03 an amusement or pastime leonardo.n.01 Italian painter andsculptor and engineer and scientist and architect; the most versatilegenius of the Italian Renaissance (1452-1519)0.25

game.n.04 animal hunted for food or sport leonardo.n.01 Italianpainter and sculptor and engineer and scientist and architect; themost versatile genius of the Italian Renaissance (1452-1519)0.923076923077

game.n.05 (tennis) a division of play during which one player servesleonardo.n.01 Italian painter and sculptor and engineer and scientistand architect; the most versatile genius of the Italian Renaissance(1452-1519)0.222222222222

game.n.06 (games) the score at a particular point or the score neededto win leonardo.n.01 Italian painter and sculptor and engineer andscientist and architect; the most versatile genius of the ItalianRenaissance (1452-1519)0.285714285714

game.n.07 the flesh of wild animals that is used for foodleonardo.n.01 Italian painter and sculptor and engineer and scientistand architect; the most versatile genius of the Italian Renaissance(1452-1519)0.5

plot.n.01 a secret scheme to do something (especially somethingunderhand or illegal) leonardo.n.01 Italian painter and sculptor andengineer and scientist and architect; the most versatile genius of theItalian Renaissance (1452-1519)0.2

game.n.09 the game equipment needed in order to play a particular gameleonardo.n.01 Italian painter and sculptor and engineer and scientistand architect; the most versatile genius of the Italian Renaissance(1452-1519)0.666666666667

game.n.10 your occupation or line of work leonardo.n.01 Italianpainter and sculptor and engineer and scientist and architect; themost versatile genius of the Italian Renaissance (1452-1519)0.25

game.n.11 frivolous or trifling behavior leonardo.n.01 Italian painterand sculptor and engineer and scientist and architect; the mostversatile genius of the Italian Renaissance (1452-1519)0.222222222222

bet_on.v.01 place a bet on leonardo.n.01 Italian painter and sculptorand engineer and scientist and architect; the most versatile genius ofthe Italian Renaissance (1452-1519)-1

crippled.s.01 disabled in the feet or legs leonardo.n.01 Italianpainter and sculptor and engineer and scientist and architect; themost versatile genius of the Italian Renaissance (1452-1519)-1

game.s.02 willing to face danger leonardo.n.01 Italian painter andsculptor and engineer and scientist and architect; the most versatilegenius of the Italian Renaissance (1452-1519)-1

但是 game.n.04 和 leonardo.n.01 之间的相似性真的很奇怪。我认为相似度(0.923076923077)不应该这么高。

game.n.04

animal hunted for food or sport

leonardo.n.01

Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519)

0.923076923077

我的概念有问题吗？

最佳答案

根据 the docs , wup_similarity() 方法返回...

...a score denoting how similar two word senses are, based on the depth of the two senses in the taxonomy and that of their Least Common Subsumer (most specific ancestor node).

...和...

>>> from nltk.corpus import wordnet as wn
>>> game = wn.synset('game.n.04')
>>> leonardo = wn.synset('leonardo.n.01')
>>> game.lowest_common_hypernyms(leonardo)
[Synset('organism.n.01')]
>>> organism = game.lowest_common_hypernyms(leonardo)[0]
>>> game.shortest_path_distance(organism)
2
>>> leonardo.shortest_path_distance(organism)
3

...这就是它认为它们相似的原因，尽管我明白...

>>> game.wup_similarity(leonardo)
0.7058823529411765

...由于某种原因这是不同的。

更新

I want some measurement which will show that dissimilarity('game', 'chess') is much much less than dissimilarity('game', 'leonardo')

这样的事情怎么样...

from nltk.corpus import wordnet as wn
from itertools import product

def compare(word1, word2):
    ss1 = wn.synsets(word1)
    ss2 = wn.synsets(word2)
    return max(s1.path_similarity(s2) for (s1, s2) in product(ss1, ss2))

for word1, word2 in (('game', 'leonardo'), ('game', 'chess')):
    print "Path similarity of %-10s and %-10s is %.2f" % (word1,
                                                          word2,
                                                          compare(word1, word2))

...打印...

Path similarity of game       and leonardo   is 0.17
Path similarity of game       and chess      is 0.25

关于python nltk 为 wordnet 相似性度量返回奇数结果，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/17296588/

25

4

0

文章推荐： python - 如何查找excel单元格是否为日期

文章推荐： Python 2.7 回车倒计时

java - Java 中的 Wordnet 相似性:JAWS、JWNL 或 Java WN::相似性？
我需要在基于 Java 的应用程序中使用 Wordnet。我想: 搜索同义词集找到同义词集之间的相似性/相关性我的应用程序使用 RDF 图，我知道 Wordnet 有 SPARQL 端点，但我想最
C# 搜索具有相似性/相似性
假设我们有一个 IEnumerable Collection，其中包含 20 000 人对象项。那么假设我们创建了另一个 Person 对象。我们想列出所有与这个人相似的人。这意味着，例如，如果姓
java - JAWS Wordnet 相似性
我使用 JAWS 作为普通的 wordnet 来查找单词之间的相似性。我安装了 wordnet 2.1 并添加了 jar 文件:edu.mit.jwi_2.1.4.jar 和 edu.sussex.
python - Word2Vec Python 相似性
我用这段代码做了一个词嵌入: with open("text.txt",'r') as longFile: sentences = [] single= []
javascript - 对象/数组比较算法以确定共性/相似性
我正在尝试找出确定各种对象或数组之间的共性或相似性的最佳方法，并且有兴趣获得社区的意见。我目前正在用 javascript 构建一个早期研究原型(prototype)，我需要采用一种巧妙的方式来比较对
c# - C# 上的 Flash 相似性
我在将 Flash 游戏转换为 C# 时遇到问题。在 Flash 中我会使用这种语法: public function doMove() { eaze(this).to(actionTime,
python - 在 PyTorch 中找到一批向量之间的 jaccard 相似性
我有一批形状为 (bs, m, n) 的向量(即维度为 mxn 的 bs 向量)。对于每个批处理，我想计算第一个向量与其余 (m-1) 个向量的 Jaccard 相似度例子: a = [ [
python - 使用 Whoosh Python 搜索库的文档比较/相似性
如何使用 Whoosh 获取文档的相似性度量？我想创建一个“相关”特征，对与文档具有高度相似性的其他先前编入索引的文档进行排名。我是否将文档作为长查询字符串输入？我是否将文档添加到索引并以某种方式
python - 比较多个 Python 列表并合并 Levenshtein 相似性
我编写了一个 Python 函数，它接受两个列表，使用 Levenshtein 比较它们并将足够相似的单词合并到一个名为“merged”的列表中。我如何为超过 6 个列表执行此操作？确保将每个列表与
c++ - 在 C++ 中使用枚举编程 iota 相似性
请原谅我对 Go 的了解非常有限。我有这样的定义 type ErrorVal int const ( LEV_ERROR ErrorVal = iota LEV_WARNING
python - 如何比较两个大文本之间的度量 - Python 中的余弦、Jaccard 相似性、Sim_MinEdit (Sim_String) 和 Sim_Simple
我正在从事文本分析项目，一次比较两个不同的报告并将结果保存到 pandas 数据框中。我能够得到 cosine 和 jacard 的相似性，但需要确保我得到正确的度量。作为参数，我使用位于给定文件夹

首页

博学

6Ren·AI

商城

python nltk 为 wordnet 相似性度量返回奇数结果