python - Keras 嵌入层掩蔽。为什么 input_dim 需要是 |vocabulary| + 2？-6ren

python - Keras 嵌入层掩蔽。为什么 input_dim 需要是 |vocabulary| + 2？

转载作者：行者123 更新时间：2023-12-01 18:56:56

24

4

在 Keras 文档中的嵌入 https://keras.io/layers/embeddings/ ，对 mask_zero 的解释是

mask_zero: Whether or not the input value 0 is a special "padding" value that should be masked out. This is useful when using recurrent layers which may take variable length input. If this is True then all subsequent layers in the model need to support masking or an exception will be raised. If mask_zero is set to True, as a consequence, index 0 cannot be used in the vocabulary (input_dim should equal |vocabulary| + 2).

为什么 input_dim 需要是 2 + 词汇中的单词数？假设0被屏蔽了不能使用，那不应该就是1+字数吗？另一个额外的条目有什么用？

最佳答案

我认为这些文档有点误导。在正常情况下，您将 n 输入数据索引 [0, 1, 2, ..., n-1] 映射到向量，因此您的 input_dim 应该与您拥有的元素一样多

input_dim = len(vocabulary_indices)

一种等效的(但有点令人困惑)的表达方式以及文档的方式是说

1 + maximum integer index occurring in the input data.

input_dim = max(vocabulary_indices) + 1

如果启用屏蔽，则值 0 的处理方式会有所不同，因此您将 n 索引加一:[0, 1, 2, ... , n-1, n]，因此你需要

input_dim = len(vocabulary_indices) + 1

或者

input_dim = max(vocabulary_indices) + 2

正如他们所说，文档在这里变得特别困惑

(input_dim should equal |vocabulary| + 2)

我将|x|解释为集合的基数(相当于len(x))，但作者的意思似乎是

2 + maximum integer index occurring in the input data.

关于python - Keras 嵌入层掩蔽。为什么 input_dim 需要是 |vocabulary| + 2？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/43227938/

24

4

0

文章推荐： java - 这红色 !在 Eclipse 中构建后进行标记

文章推荐： ios - Parse.com 以编程方式向类添加一行

文章推荐： sql - 如何列出数据库中运行的计划作业？

vocabulary - 您如何称呼旧的且不应再使用的方法或库？
它必须有一个名字。我在考虑退化或未使用(旧的描述性不够)。有没有人想出一些描述性的东西来调用它？最佳答案 Deprecated (在 Java 中)或 Obsolete (在 C# 中) 关于voc
vocabulary - 标签类别
我正在启动一个公益项目，该项目是世界上最大的琵琶音乐收藏的Web界面，从多个角度来看，这是一个具有挑战性的收藏。这些作品主要是从1400年到1600年，但范围从1200年代中期到今天。毋庸置疑，作品的
vocabulary - 我可以在 spaCy 中修剪解析器的词汇表吗？
以下代码使用spaCy word vectors通过首先计算词汇表(超过一百万)中所有单词的余弦相似度，然后对最相似单词列表进行排序，找到与给定单词最相似的 20 个单词。 parser = Engl
python - 计数矢量器 : Vocabulary wasn't fitted
我实例化了一个 sklearn.feature_extraction.text.CountVectorizer通过 vocabulary 参数传递一个词汇表来对象，但我得到一个 sklearn.uti
types - 什么是 "vocabulary types"，存在多少？
跨编程语言，我遇到过类似的 composite types不同的名字: Optional / Maybe Any Variant / Sum Record / Product 人们经常使用术语词汇类
python - 带有TfidfVectorizer的ColumnTransformer产生 “empty vocabulary”错误
我正在使用ColumnTransformer运行一个非常简单的实验，目的是转换列数组，在此示例中为[“a”]: from sklearn.feature_extraction.text import
python - sklearn模型数据转换错误: CountVectorizer - Vocabulary wasn't fitted
我已经训练了一个主题分类模型。然后，当我要将新数据转换为向量进行预测时，就会出错。它显示“NotFittedError:CountVectorizer - 词汇未安装。”但是，当我通过将训练数据拆分为
python - 未安装错误: TfidfVectorizer - Vocabulary wasn't fitted
我正在尝试使用 scikit-learn/pandas 构建一个情感分析器。构建和评估模型有效，但尝试对新样本文本进行分类却行不通。我的代码: import csv import pandas as
python - 未安装错误 : TfidfVectorizer - Vocabulary wasn't fitted
我正在尝试使用 scikit-learn/pandas 构建情绪分析器。构建和评估模型有效，但尝试对新示例文本进行分类却行不通。我的代码: import csv import pandas as p
python - Gensim: KeyError: "word not in vocabulary"
我有一个使用 Python 的 Gensim 库训练的 Word2vec 模型。我有一个标记化列表如下。词汇量是 34，但我只给出了 34 个中的几个: b = ['let', 'know', '
python - 如何提前判断 CountVectorizer 是否会抛出 ValueError : empty vocabulary?
是否可以提前知道 CountVectorizer 是否会抛出 ValueError: empty vocabulary? 基本上，我有一个文档语料库，我想过滤掉那些无法通过 CountVectoriz
algorithm - 使用 "Scalable Recognition with a Vocabulary Tree"实现图像匹配
关闭。这个问题不符合Stack Overflow guidelines .它目前不接受答案。我们不允许提问寻求书籍、工具、软件库等的推荐。您可以编辑问题，以便用事实和引用来回答。关闭 6 年前。
python - 加载 pickled 分类器数据 : Vocabulary not fitted Error
我在这里阅读了所有相关问题，但找不到有效的解决方案: 我的分类器创建: class StemmedTfidfVectorizer(TfidfVectorizer): def build_ana
python - gensim word2vec : Find number of words in vocabulary
使用 python 训练 word2vec 模型后 gensim ，如何找到模型词汇表中的单词数？最佳答案在最近的版本中，model.wv 属性包含单词和向量，并且 can 本身可以报告长度 -
terminology - 在Web开发中，什么是 "term"、 "taxonomy terms"和 "vocabulary"？
我需要有目的地再创建 2 个表:一个表将存储标签和类别数据(类别可以有层次结构，但标签没有)，另一个表存储标签、类别和内容之间的关系。但我对那两张 table 的名称很困惑。我确实是网络开发的新手。经
scala - Spark MLib Word2Vec 错误 : The vocabulary size should be > 0
我正在尝试使用 Spark 的 MLLib 实现词向量化。我按照给出的例子 here . 我有一堆句子，我想将它们作为输入来训练模型。但我不确定这个模型是否采用句子或仅将所有单词作为字符串序列。我的
python - Keras 嵌入层掩蔽。为什么 input_dim 需要是 |vocabulary| + 2？
在 Keras 文档中的嵌入 https://keras.io/layers/embeddings/ ，对 mask_zero 的解释是 mask_zero: Whether or not the i
python - 设置 word2vec - KeyError : "word ' word' not in vocabulary"
我尝试使用 word2vec，但在尝试对任何单词执行任何操作时都会出错。这似乎是一个编码问题，这是我所做的: 初始化word2vec: import gensim, logging logging.b
python - key 错误 : “word ' word' not in vocabulary” in word2vec
我正在使用word2vec，我训练的wiki语料库，如果我输入的单词不在word2vec的词汇表中怎么办？测试一下: model = word2vec.Word2Vec.load('model/'
python-3.x - 未安装错误: TfidfVectorizer - Vocabulary wasn't fitted python
目标:预测原始数据的标签背景:我构建了一个 SVM 分类器我正在使用以下代码: 0) 导入模块 import numpy as np from sklearn import cro

首页

博学

6Ren·AI

商城

python - Keras 嵌入层掩蔽。为什么 input_dim 需要是 |vocabulary| + 2？