gpt4 book ai didi

python - python 上的 gensim Word2Vec 的不同模型

转载 作者:行者123 更新时间:2023-11-30 22:54:31 25 4
gpt4 key购买 nike

我正在尝试在Python中应用gensim库中实现的word2vec模型。我有一个句子列表(每个句子都是一个单词列表)。

例如让我们:

sentences=[['first','second','third','fourth']]*n

我实现了两个相同的模型:

model = gensim.models.Word2Vec(sententes, min_count=1,size=2)
model2=gensim.models.Word2Vec(sentences, min_count=1,size=2)

我意识到模型有时是相同的,有时是不同的,具体取决于 n 的值。

例如,如果 n=100 我得到

print(model['first']==model2['first'])
True

同时,对于 n=1000:

print(model['first']==model2['first'])
False

这怎么可能?

非常感谢!

最佳答案

查看gensim documentation ,运行 Word2Vec 时会出现一些随机化:

seed = for the random number generator. Initial vectors for each word are seeded with a hash of the concatenation of word + str(seed). Note that for a fully deterministically-reproducible run, you must also limit the model to a single worker thread, to eliminate ordering jitter from OS thread scheduling.

因此,如果您想获得可重现的结果,则需要设置种子:

In [1]: import gensim

In [2]: sentences=[['first','second','third','fourth']]*1000

In [3]: model1 = gensim.models.Word2Vec(sentences, min_count = 1, size = 2)

In [4]: model2 = gensim.models.Word2Vec(sentences, min_count = 1, size = 2)

In [5]: print(all(model1['first']==model2['first']))
False

In [6]: model3 = gensim.models.Word2Vec(sentences, min_count = 1, size = 2, seed = 1234)

In [7]: model4 = gensim.models.Word2Vec(sentences, min_count = 1, size = 2, seed = 1234)

In [11]: print(all(model3['first']==model4['first']))
True

关于python - python 上的 gensim Word2Vec 的不同模型,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37745250/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com