python - word2vec_basic 输出 : trying to test word similarity versus human similarity scores-6ren

python - word2vec_basic 输出 : trying to test word similarity versus human similarity scores

转载作者：太空宇宙更新时间：2023-11-04 05:11:30

25

4

作为熟悉 Tensorflow 的一种方式，我正在尝试验证 word2vec_basic.py(请参阅 tutorial)生成的词嵌入在对照人类相似性分数进行检查时是否有意义。然而，结果出人意料地令人失望。这就是我所做的。

在 word2vec_basic.py 中，我在最后添加了另一个步骤以将嵌入和反向字典保存到磁盘(因此我不必每次都重新生成它们):

with open("embeddings", 'wb') as f:
    np.save(f, final_embeddings)
with open("reverse_dictionary", 'wb') as f:
    pickle.dump(reverse_dictionary, f, pickle.HIGHEST_PROTOCOL)

在我自己的 word2vec_test.py 中，我加载它们并创建一个用于查找的直接字典:

with open("embeddings", 'rb') as f:
    embeddings = np.load(f)
with open("reverse_dictionary", 'rb') as f:
    reverse_dictionary = pickle.load(f)
dictionary = dict(zip(reverse_dictionary.values(), reverse_dictionary.keys()))

然后我将相似度定义为嵌入向量之间的欧氏距离:

def distance(w1, w2):
    try:
        return np.linalg.norm(embeddings[dictionary[w1]] - embeddings[dictionary[w2]])
    except:
        return None # no such word in our dictionary

到目前为止，结果是有意义的，例如 distance('before', 'after') 小于 distance('before', 'into')。

然后，我从 http://alfonseca.org/pubs/ws353simrel.tar.gz 下载了人类分数(我从“Model Zoo”的 Swivel 项目中借用了下面的链接和代码)。我比较人类的相似性和嵌入距离得分如下:

with open("wordsim353_sim_rel/wordsim_relatedness_goldstandard.txt", 'r') as lines:
  for line in lines:
    w1, w2, act = line.strip().split('\t')
    pred = distance(w1, w2)
    if pred is None:
      continue

    acts.append(float(act))
    preds.append(-pred)

我使用 -pred 因为人类得分随着相似性的增加而增加，所以需要反转距离顺序来匹配(较小的距离意味着较大的相似性)。

然后我计算相关系数:

rho, _ = scipy.stats.spearmanr(acts, preds)
print(str(rho))

但结果非常小，比如 0.006。我用 4 个词的上下文和 256 的向量长度重新训练了 word2vec_basic，但它根本没有改善。然后我使用余弦相似度代替欧氏距离:

def distance(w1, w2):
    return scipy.spatial.distance.cosine(embeddings[dictionary[w1]], embeddings[dictionary[w2]])

仍然没有相关性。

那么，我误解或做错了什么？

最佳答案

回答我自己的问题:是的，结果令人沮丧，但那是因为模型太小并且训练的数据太少。就如此容易。 The implementation I experimented with使用 1700 万个单词的语料库并运行 100K 步，并且仅采用 2 个相邻的上下文单词，嵌入大小为 128。我得到了一个更大的维基百科样本，包含 12400 万个单词，将上下文增加到 24 个单词(每边 12 个)，嵌入大小为 256，并训练了 1.8M 步，瞧!相关性(在我上面的问题中测量)增长到 0.24。

然后我按照 in this tutorial 中的描述实现了频繁词的子采样相关系数进一步跃升至 0.33。最后，我把我的笔记本电脑放在一夜之间，用 36 个上下文单词和 3.2M 步进行训练，它一直到 0.42!我认为我们可以称之为成功。

因此，对于像我这样玩它的人来说，它看起来像是一款需要大量数据、大量耐心和 NVidia 硬件(我目前没有)的游戏。但这仍然很有趣。

关于python - word2vec_basic 输出 : trying to test word similarity versus human similarity scores，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42881590/

25

4

0

文章推荐： c++ - 从 C 中创建一个新的 EXE

文章推荐： html - 宽度 100% 超过屏幕的 100%

文章推荐： c - 计算数字总和的程序

文章推荐： c - 在 C 中打开文件并输出

c++ - gtest，对 'testing::Test::~Test()' 的 undefined reference ，testing::Test::Test()
我使用 apt-get install libgtest-dev 安装了 gtest 我正在尝试检查它是否有效。所以我在 eclipse 中编写了简单的测试代码。但是有错误， undefined
perl - ($ test)=(@test)之间有什么区别？和$ test = @test;在Perl？
($test) = (@test); $test = @test; 用一个括号括住变量，它访问数组的第一个元素。我找不到有关数组括号的信息。最佳答案 ($test) = (@test); 这会将@t
unit-testing - clojure.test/are 与 clojure.test/testing
在 clojure.test 中有一个允许同时测试多个设备的宏: are . 在 clojure.test 中，可以结合 are宏与 testing ? IE。就像是: (are [scenario
unit-testing - `#[test]` 是否意味着 `#[cfg(test)]` ？
通常，Rust 中的单元测试被赋予一个单独的模块，该模块使用 #[cfg(test)] 进行条件编译: #[cfg(test)] mod tests { #[test] fn test
debugging - 你如何看待无所不在的 "Test, Test, Test!"原则？
在过去，编程很少涉及猜测。我会写几行代码，一眼就能 100% 确定代码做什么和不做什么。错误主要是拼写错误，但与功能无关。我相信在过去的几年中存在这种“试错”编程的趋势:编写代码(就像在草稿中一样)
testing - 使用多个 `--tests`
在building the Kotlin compiler之后(在提交e80a01a处): ./gradlew dist 测试未成功通过: ./gradlew compiler:test 由于很少有测
testing - 密码唯一性 : how to test?
关闭。这个问题需要更多focused .它目前不接受答案。想改进这个问题吗？更新问题，使其只关注一个问题 editing this post . 关闭 9 年前。 Improve this qu
testing - "fuzz testing"和 "monkey test"之间的区别
最近一直在思考模糊测试和猴子测试的区别。根据 wiki，猴子测试似乎“只是”一个单元测试，而模糊测试则不是。安卓有 UI/Application Exerciser monkey而且它看起来不像是单元
testing - 敏捷方式 : Integration Testing vs Functional Testing or both?
按照目前的情况，这个问题不适合我们的问答形式。我们希望答案得到事实、引用或专业知识的支持，但这个问题可能会引发辩论、争论、投票或扩展讨论。如果您觉得这个问题可以改进并可能重新打开，visit the
c++ - 为什么是 'make test' 和 "./test/Test"
现在我正在使用 CMake 设置一个 C++ 测试环境。其实我已经意识到我想做什么，但我对两种不同的测试输出风格感到困惑。在我下面的示例中，“make test”实际上做了什么？我认为“make te
unit-testing - VS2012 : Clear the test results in Test Explorer when re-running a test that previously failed
在 VS2012 中运行单个测试时，测试资源管理器底部会显示一个窗口，其中包括(假设失败)旁边带有“测试失败”的红色图标。紧随其后的是带有“已用时间”的失败消息。我想简单地知道是否有办法清除这个窗口
bash :如果 [ "echo test"== "test"]；然后回显 "echo test outputs test on shell"fi；可能的？
bash 是否可以从 shell 执行命令，如果它返回某个值(或空值)则执行命令？ if [ "echo test" == "test"]; then echo "echo test output
smoke-testing - 为什么 "smoke tests"被称为 "smoke tests"？
这个问题在这里已经有了答案: 8年前关闭。 Possible Duplicate: What is a smoke testing and what will it do for me? 为什么“冒烟
multithreading - 何时使用 Test&Set 或 Test&Test&Set？
x86 下的并行编程可能很困难，尤其是在多核 CPU 下。假设我们有多核 x86 CPU 和更多不同的多线程通信组合。单一作者和单一读者单个读者多个作者多个读者和单个作者多个读者和多个作者那
unit-testing - CTest-使用--test-command选项进行--build-and-test
我使用Ctest来运行一堆使用add_test()注册的Google测试。当前，这些测试没有任何参数。但是，我想在运行--gtest_output=xml时为它们提供所有参数(所有参数都通用，特别是c
mysql - 有人可以解释为什么当名称 = 'test' : "test" and "test " 时 MySQL 返回两个值
我有下表和数据: CREATE TABLE `test` ( `id` int(11) NOT NULL auto_increment, `name` varchar(8) NOT NULL,
testing - go test `-parallel` vs `-test.parallel` 哪个标志优先？
go test 的两个标志 -parallel 和 -test.parallel 之间的区别以及哪个标志优先？ -parallel n Allow parallel execu
unit-testing - vue.js unit :test w test-utils and Jest : How can I test - window. open() 在一个方法中？
在我的组件 AudioPlayer 中，我有一个 download() 方法: download() { this.audio.pause(); window.open(this.file,
ruby-on-rails - db :test:clone, db :test:clone_structure, db :test:load, 和 db :test:prepare? 有什么区别
您必须承认，对于 Rails 和数据库的新手来说，rubyonrails.org 上的官方解释使所有这四个任务听起来完全一样。引用: rake db:test:clone Recreate the
unit-testing - 了解 "test-first"和 "test-driven"之间的区别
我过去曾讨论过这个话题，我想我可能知道答案，但我无法正确地表达出来。这是我认为我所知道的: 如果您在编写测试之前已经有了关于事情如何工作的想法，那么我怀疑您是测试优先而不是测试驱动，因此您首先编写测

首页

博学

6Ren·AI

商城

python - word2vec_basic 输出 : trying to test word similarity versus human similarity scores