gpt4 book ai didi

python - 麻烦加载手套 840B 300d 矢量

转载 作者:行者123 更新时间:2023-11-28 17:08:53 25 4
gpt4 key购买 nike

似乎格式是,对于每一行,字符串就像'word number number .....'。所以很容易 split 它。但是当我用下面的脚本拆分它们时

import numpy as np
def loadGloveModel(gloveFile):
print "Loading Glove Model"
f = open(gloveFile,'r')
model = {}
for line in f:
splitLine = line.split()
word = splitLine[0]
embedding = np.array([float(val) for val in splitLine[1:]])
model[word] = embedding
print "Done.",len(model)," words loaded!"
return model

我加载手套 840B 300d.txt。但是出现错误,我打印了我得到的 splitLine

['contact', 'name@domain.com', '0.016426', '0.13728', '0.18781', '0.75784', '0.44012', '0.096794' ... ]

['.', '.', '.', '.', '0.033459', '-0.085658', '0.27155', ...]

请注意,此脚本在 glove.6b 中运行良好。*

最佳答案

代码适用于以下文件:glove.6B.*d.txt、glove.42B.*d.txt,但不适用于 glove.6B.300d.txt。这是因为 glove.6B.300d.txt 中包含一个单词的空格。例如,它有这样一个词:'. . .'这些点之间有空格。我通过更改这一行解决了这个问题:

splitLine = line.split()

进入

splitLine = line.split(' ')

所以你的代码必须是这样的:

import numpy as np
def loadGloveModel(gloveFile):
print "Loading Glove Model"
f = open(gloveFile,'r', encoding='utf8')
model = {}
for line in f:
splitLine = line.split(' ')
word = splitLine[0]
embedding = np.asarray(splitLine[1:], dtype='float32')
model[word] = embedding
print "Done.",len(model)," words loaded!"
return model

关于python - 麻烦加载手套 840B 300d 矢量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49083826/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com