python - 如何在 Tensorboard Projector 中可视化 Gensim Word2vec 嵌入-6ren

python - 如何在 Tensorboard Projector 中可视化 Gensim Word2vec 嵌入

转载作者：行者123 更新时间：2023-12-04 13:24:46

关注 gensim word2vec embedding tutorial ，我已经训练了一个简单的 word2vec 模型:

from gensim.test.utils import common_texts
from gensim.models import Word2Vec
model = Word2Vec(sentences=common_texts, size=100, window=5, min_count=1, workers=4)
model.save("/content/word2vec.model")

我想把它形象化 using the Embedding Projector in TensorBoard . There is another straightforward tutorial in gensim documentation .我在 Colab 中做了以下事情:

!python3 -m gensim.scripts.word2vec2tensor -i /content/word2vec.model -o /content/my_model

Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.7/dist-packages/gensim/scripts/word2vec2tensor.py", line 94, in <module>
    word2vec2tensor(args.input, args.output, args.binary)
  File "/usr/local/lib/python3.7/dist-packages/gensim/scripts/word2vec2tensor.py", line 68, in word2vec2tensor
    model = gensim.models.KeyedVectors.load_word2vec_format(word2vec_model_path, binary=binary)
  File "/usr/local/lib/python3.7/dist-packages/gensim/models/keyedvectors.py", line 1438, in load_word2vec_format
    limit=limit, datatype=datatype)
  File "/usr/local/lib/python3.7/dist-packages/gensim/models/utils_any2vec.py", line 172, in _load_word2vec_format
    header = utils.to_unicode(fin.readline(), encoding=encoding)
  File "/usr/local/lib/python3.7/dist-packages/gensim/utils.py", line 355, in any2unicode
    return unicode(text, encoding, errors=errors)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

请注意，我确实先检查了这个 exact same question from 2018 - 但接受的答案不再有效，因为 gensim 和 tensorflow 都已更新，所以我认为值得在 2021 年第四季度再次询问。

最佳答案

以原始 C word2vec 实现格式保存模型解决了该问题:model.wv.save_word2vec_format("/content/word2vec.model") :

from gensim.test.utils import common_texts
from gensim.models import Word2Vec
model = Word2Vec(sentences=common_texts, size=100, window=5, min_count=1, workers=4)
model.wv.save_word2vec_format("/content/word2vec.model")

gensim中有两种存储word2vec模型的格式:来自原始 word2vec 实现和格式的键控向量格式，另外存储隐藏权重、词汇频率等。示例和详细信息可以在 documentation 中找到.脚本 word2vec2tensor.py使用原始格式并使用 load_word2vec_format 加载模型: code .

关于python - 如何在 Tensorboard Projector 中可视化 Gensim Word2vec 嵌入，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/69234978/

文章推荐： qt - QLabel中的文本滚动(字幕)

文章推荐： apache-flex - Flex : Get self SWF file name?

文章推荐： java - 为什么没有为某些语言环境设置 WM_NAME 原子

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 如何在 Tensorboard Projector 中可视化 Gensim Word2vec 嵌入