作者热门文章
- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
如何将预先训练的词嵌入加载到 Keras Embedding
层中?
我下载了 glove.6B.50d.txt
(来自 https://nlp.stanford.edu/projects/glove/ 的 glove.6B.zip 文件),但我不确定如何将其添加到 Keras 嵌入层。请参阅:https://keras.io/layers/embeddings/
最佳答案
您需要将 embeddingMatrix 传递到 Embedding
层,如下所示:
嵌入(vocabLen,embDim,权重=[embeddingMatrix],trainable=isTrainable)
vocabLen
:词汇表中的标记数量embDim
:嵌入向量维度(示例中为 50)embeddingMatrix
:根据 glove.6B.50d.txt 构建的嵌入矩阵isTrainable
:您是否希望嵌入可训练或卡住图层glove.6B.50d.txt
是一个空格分隔值的列表:单词标记 + (50) 个嵌入值。例如0.418 0.24968 -0.41242 ...
要从 Glove 文件创建 pretrainedEmbeddingLayer
:
# Prepare Glove File
def readGloveFile(gloveFile):
with open(gloveFile, 'r') as f:
wordToGlove = {} # map from a token (word) to a Glove embedding vector
wordToIndex = {} # map from a token to an index
indexToWord = {} # map from an index to a token
for line in f:
record = line.strip().split()
token = record[0] # take the token (word) from the text line
wordToGlove[token] = np.array(record[1:], dtype=np.float64) # associate the Glove embedding vector to a that token (word)
tokens = sorted(wordToGlove.keys())
for idx, tok in enumerate(tokens):
kerasIdx = idx + 1 # 0 is reserved for masking in Keras (see above)
wordToIndex[tok] = kerasIdx # associate an index to a token (word)
indexToWord[kerasIdx] = tok # associate a word to a token (word). Note: inverse of dictionary above
return wordToIndex, indexToWord, wordToGlove
# Create Pretrained Keras Embedding Layer
def createPretrainedEmbeddingLayer(wordToGlove, wordToIndex, isTrainable):
vocabLen = len(wordToIndex) + 1 # adding 1 to account for masking
embDim = next(iter(wordToGlove.values())).shape[0] # works with any glove dimensions (e.g. 50)
embeddingMatrix = np.zeros((vocabLen, embDim)) # initialize with zeros
for word, index in wordToIndex.items():
embeddingMatrix[index, :] = wordToGlove[word] # create embedding: word index to Glove word embedding
embeddingLayer = Embedding(vocabLen, embDim, weights=[embeddingMatrix], trainable=isTrainable)
return embeddingLayer
# usage
wordToIndex, indexToWord, wordToGlove = readGloveFile("/path/to/glove.6B.50d.txt")
pretrainedEmbeddingLayer = createPretrainedEmbeddingLayer(wordToGlove, wordToIndex, False)
model = Sequential()
model.add(pretrainedEmbeddingLayer)
...
关于python - 如何从预先训练的词嵌入数据集创建 Keras 嵌入层?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48677077/
问了这个问题How to reach CSS zen? ,我现在明白我遇到的问题大多与定位有关。我发现一些文章说 CSS 作为布局系统并不总是足够好。 http://echochamber.me/vi
我是一名优秀的程序员,十分优秀!